

# Implementation of Discrete Cosine Transform & Different CORDIC Algorithms

Amber Pathariya<sup>1</sup> and Prof. Abhishek Agwekar<sup>2</sup> Electronics & Communication Engineering, RGPV, Bhopal, Madhya Pradesh, India<sup>1</sup> Electronics & Communication Engineering, RGPV, Bhopal, Madhya Pradesh, India<sup>2</sup> *amberpathariya@gmail.com<sup>1</sup>*, *abhishek.agwekar@trubainstitute.ac.in<sup>2</sup>* 

Abstract: Real time information processing requires the utilization of particular purpose hardware which includes hardware efficiency just as high throughput. Those architectures which includes multipliers, for instance Chen's algorithm has less ordinary architecture because of complex routing and requires enormous silicon area. Then again, the DCT architecture dependent on distributed arithmetic (DA) which is likewise a multiplier less architecture has the intrinsic hindrance of less throughputs due to the ROM access time and the need of gatherer. To eliminate or optimize the limit of discrete cosine transform plan, this paper has proposed many exploration objectives and presented them. The great objective of this paper is to plan an effective discrete cosine transform utilizing based modified CORDIC it includes an uncommon serial number shuffling unit having three development registers, three adders/sub farm haulers, Look-Up table and phenomenal interconnections. Looking at and organizing the presentation of different geography dependent on multiplier and DCT analysis by ascertaining cuts, number of LUTs, input/yield limited, maximum combinational path delay (MCPD). To decrease delay and cuts (area) in DCT architecture dependent on modified CORDIC algorithm.

# **1. INTRODUCTION**

In recent years, a lot of research has been done on lowpower DCT designs. In light of VLSI-implementation, Flow-Graph Algorithm (FGA) is the most well known approach to realize the quick DCT (FDCT) [2]. Because of quick progression of VLSI technology, momentum research has been coordinated to the high requesting realtime figure escalated applications like digital signal processing (DSP), graphical processing, correspondence, etc. Then, the performance and efficiency of computers for these applications have been constantly evolved to fulfill high needs as far as speed and accuracy [5]. DISCRETE cosine transform (DCT) is the most generally utilized transformation technique in the field of signal processing especially in the territories of image data. As multimedia applications on versatile low-power gadgets become more noticeable, the requirement for proficient low power image encoding and decoding techniques increments. The utilization of DCT has been concentrated widely and the relating algorithms created are related with various complex additions and multiplications. So the functional realization of the perplexing portrayal sets aside a ton of effort to comprehend the genuine hardware implementation. As an outcome, there is a need to build up an algorithm for effective implementation of DCT on hardware [6].

Discrete cosine transform (DCT) is the most broadly utilized transformation technique in the field of signal processing especially in the zones of image data [7]. For a drawn out the field of virtual signal Processing has been overwhelmed by Microprocessors. That is particularly because of the reality they furnish planners with the favors of unmarried cycle increase assemble instruction notwithstanding extraordinary addressing modes. Despite the fact that these processors are sensibly evaluated and bendy they are particularly slow regarding the matter of performing sure upsetting signal processing obligations for example photograph Compression, virtual verbal trade and Video Processing. Of past due, quick headways were made inside the field of VLSI and IC layout. Therefore novel reason processors with the custom-architectures have come up. higher speeds can be performed via those altered hardware answers at serious costs. To highlight to this, various simple and hardware-



proficient algorithms exist which map pleasantly onto those chips and can be utilized to enhance velocity and flexibility simultaneously as seeming the supported signal processing commitments [11]. With the guide of making moderate adjustments to utilization of least difficult shifttransfer arithmetic, VLSI implementation of a particularly set of rules is effectively functional. DCT set of rules has various projects and is widely utilized for photograph pressure.

There are numerous transformation among those, Discrete Cosine Transform (DCT) was conveyed by Mr. Rao and Ahmed, Natarajan during the year 1974. This has some way or another ready to get champion in the many transforms, by the generally utilized changed frameworks as a device to b sued for electronic consideration of. The DCT transform is the sort of computationally elevated changes which require different developments and additions [1]. In light of the wide-spread utilization of DCT's, research into quick algorithms for their implementation has been somewhat dynamic ,and furthermore, since the DCT is calculation concentrated, the improvement of rapid hardware and real-time DCT processor configuration have been object of research . Discrete cosine transform (DCT) is generally utilized in image processing, particularly for pressure. A portion of the uses of two-dimensional DCT include still image pressure and pressure of individual video outlines, while multidimensional DCT is for the most part utilized for pressure of video streams.

is likewise valuable for moving DCT multidimensional data to frequency domain, where various activities, similar to spread-spectrum, data pressure, data watermarking, can be acted in simpler and more effective way. Various papers talking about DCT algorithms is accessible in the writing that implies its significance and application. Hardware implementation of parallel DCT transform is conceivable, that would give higher throughput than software arrangements. Unique reason DCT hardware diminishes the computational burden from the processor and consequently improves the performance of complete multimedia framework. The throughput is straightforwardly affecting the nature of involvement of multimedia content. Another significant factor that impacts the quality is the limited register length impact that influences the accuracy of the forward-converse transformation measure. Subsequently, the inspiration for examining hardware explicit DCT algorithms is clear.

The usage of cosine instead of sine limits is fundamental in these applications. For pressure, taking everything into account cosine limits are generously more successful, while for differential conditions the cosines express a particular choice of breaking point conditions. In particular, a DCT is a Fourier-related change like the discrete Fourier change (DFT), anyway using simply authentic numbers. DCTs are proportionate to DFTs of by and large twofold the length, chipping away at certified data with even symmetry (since the Fourier difference in an authentic and even limit is veritable and even), where in a couple of varieties the data and additionally yield data are moved significantly a model [5].

Figure 1.1 shows the implementation of discrete cosine transform is arranged into three classes. The real work executed in the past literature can be characterized in three distinct classifications as :

- 1) Multiplier based
- 2) Distributive arithmetic

3) CORDIC algorithm.

Image data, particularly top quality image is conveying enormous measure of data, be that as it may, the capacity and processing of images for specialized gadgets are feeling the squeeze.

Various necessities for real-time transmission of multimedia data in different correspondence networks are likewise a big challenge test significantly under the current high transmission rate. In this manner, effective image pressure assumes an essential part in such cases. This segment will order the basic DCT based techniques as demonstrated in the Figure 1.





Here, we give a few properties of the DCT which are of specific incentive to image processing applications.

**Decorrelation**- As examined beforehand, the standard favorable position of image transformation is the expulsion of redundancy between adjoining pixels. This prompts uncorrelated transform coefficients which can be encoded freely. The sufficiency of the autocorrelation after the DCT activity is exceptionally little at all slacks. Consequently, it tends to be gathered that DCT displays great decorrelation properties.

**Energy Compaction**- Efficiency of a transformation plan can be straightforwardly checked by its capacity to pack



input data into as couple of coefficients as could be expected. This allows the quantizer to dispose of coefficients with generally little amplitudes without presenting visual mutilation in the remade image. DCT shows brilliant energy compaction for exceptionally correlated images.

**Separability-** This property, known as separability, has the rule advantage that it very well may be registered in two stages by progressive 1-D procedure on rows and columns of an image. This thought is graphically illustrated in Figure 2. The contentions introduced can be indistinguishably applied for the inverse DCT calculation.



Fig 2: Computation of 2-D DCT using separability property.

**Symmetry-** This is an extremely useful property since it implies that the transformation matrix can be precomputed offline and then applied to the image thereby providing orders of magnitude improvement in computation efficiency.

**Orthogonality-** Therefore, and in addition to its decorrelation characteristics, this property renders some reduction in the pre-computation complexity.

It is a real transform with better computational efficiency than DFT which by definition is a complex transform. It does not introduce discontinuity while imposing periodicity in the time signal. In DFT, as the time signal is truncated and assumed periodic, discontinuity is introduced in time domain and some corresponding artifacts is introduced in frequency domain. But as even symmetry is assumed while truncating the time signal, no discontinuity and related artifacts are introduced in DCT.

The rest of this paper is organized as follows. This section introduces the paper and discrete cosine transforms and gives brief history of the discrete cosine transform. Section 2 briefs CORDIC architecture & benefits. Review of literature is described in Section 3. Section 4 illustrates DCT and section 5 gives its architecture. Section 6 gives CORDIC algorithm & principles and simulation tools are explained in section 7. Section 8 gives the proposed algorithm & describes the results, while section 9 discusses comparative result analysis of this paper. Finally conclusion part is given in section 10

#### 2. CORDIC ARCHITECTURE & BENEFITS

In this section, a couple of architectures for planning the CORDIC algorithm into hardware are introduced. As a rule, the architectures can be comprehensively named folded and unfolded as demonstrated in Figure 3, in view of the realization of the three iterative conditions. Folded architectures are acquired by duplicating every one of the distinction conditions of the CORDIC algorithm into hardware and time multiplexing all the emphasess into a solitary useful unit. Collapsing gives a way to exchanging area for time signal processing architectures.



Fig 3: Taxonomy of CORDIC architectures.

The folded architectures can be ordered into bit-serial and word-serial architectures relying upon whether the useful unit actualizes the logic for the slightest bit or single word of every emphasis of the CORDIC algorithm. The CORDIC algorithm has customarily been actualized utilizing bit serial design with all emphasess executed in a similar hardware. This slows down the computational gadget and henceforth, isn't appropriate for fast implementation. The word serial design is an iterative CORDIC engineering got by realizing the cycle conditions. In this design, the shifters are changed in every emphasis to cause the ideal shift for the cycle. The suitable rudimentary angles, ai are gotten to from a lookup table. The most ruling speed factors during the emphasess of word serial design are convey/acquire proliferate expansion/deduction and variable shifting tasks, delivering the customary CORDIC implementation slow for high velocity applications. These downsides were overwhelmed by unfurling the cycle interaction, so every one of the



processing components consistently play out a similar emphasis.

The coordinate rotation digital PC (CORDIC) technique is a viable algorithm which is able to do iteratively assessing geometrical, dramatic or logarithmic functions (among others) just as to make vector rotations, by methods for a shift-snake structure, which guarantee an ideal utilization of the computation assets while acquiring preferable performance over the duplication and accumulation (MAC) arithmetic unit [6]. In this dissertation quite possibly the most computationally high algorithm called the Discrete Cosine Transform is executed with the assistance CORDIC (Co-ordinate Rotation Digital PC) algorithm which brings about a multiplier less architectures and devours less hardware. CORDIC utilizes just Shift-and Add arithmetic with table Look-Up to actualize changed functions. DCT algorithm has different applications and is broadly utilized for Image compression.

#### **3. RELATED WORK**

This section review the existing work done in the field of the DCT. The whole literature survey gives review of multiplier based DCT, distributive arithmetic based DCT and CORDIC algorithm based DCT.

**Sudarshan TSB et al. in [1]** has proposed an architecture for convolution based DFT and its FPGA implementation. Proposed architecture contains a pre-processing component, systolic array and a post processing stage. The exhibition investigation is completed as far as hardware utilization and computation time and contrasted and existing comparative architectures. Further, as the convolution based DCT has two systolic arrays like that of DFT, a brought together architecture is proposed for 1D DFT/1D DCT.

**Prof. Bhaskar S.V in [2]** portrays with the productive utilization of the discreet cosine Transform and Inverse discrete cosine Transform algorithm in a more proficient manner by the utilization of Advanced CORDIC algorithm. These algorithms are the most generally utilized transform method in Digital Signal and Image Processing. They have utilized Efficient Adder rather than Multiplier. Signal progression of 8-point DCT and IDCT CORDIC algorithm are coded and functionality of the plan will check utilizing simulator. The plan will blend utilizing Cadence Synthesis apparatus and the bit document will unloaded to a Spartan 4 FPGA unit.

**Linbin Chen et al. in [3]** have proposed another surmised plot for coordinate rotation digital computer (CORDIC) plan, this plan depends on altering the current Para-CORDIC architecture with an estimate that is embedded in various parts and made conceivable by loosening up the CORDIC algorithm itself. A completely parallel surmised CORDIC (FPAX-CORDIC) conspire is proposed; this plan maintains a strategic distance from the memory register of Para-CORDIC and makes the age of the rotation direction completely parallel. A comprehensive analysis and the evaluation of the error presented by the estimation along with various circuit-related measurements are sought after utilizing HSPICE as the reproduction instrument.

Huijie Zhu et al. in [4] have presented a modified CORIDC algorithm for computation of arctangent. Not at all like the regular CORDIC algorithm which has constant number of iterations, the proposed algorithm computes the closest predefined angle to the information stage toward the start of each emphasis and then turns by an ideal angle. As it has variable iterations, the novel rotation methodology gives quicker union speed that lessens the quantity of iterations concerning the previous methodologies. Further, the presentation of the proposed algorithm is improved when contrasted and that of the ordinary one at cost of a similar greatest move number. With various recreation experiments, the exhibition of the proposed algorithm is validated by the mathematical results.

**Muhammad Nasir Ibrahim et al. in [5]** have actualized a CORDIC coprocessor on Field Programmable Gate Array (FPGA), to quicken the presentation of a few arithmetic computations like augmentation and division, just as 11 rudimentary transcendental functions. As CORDIC algorithm experiences constraints for its convergence domain and speed, the unified argument reduction algorithm and the hybrid angle strategy were embraced. The coprocessor was coordinated into NIOS II delicate processor to build up a NIOS II-put together implanted System-with respect to Chip (SoC), planned on Altera DE0 load up running at 50MHz of clock frequency.

**Somayyeh Mohammadi et al. in [13]** have investigated a transform-based watermarking technique in this examination. The watermark signal is worked by utilizing a disorderly guide and then is installed to cosine coefficients of the mixed host image. Test results affirm that the watermarked signal is robust against a scope of signal processing and assaults which undermined a watermarking framework.

**K Ravi Kiran et al. in [14]** have planned fast Adder for hardware proficient (DCT) based Algorithm, leakage power, internal power, net power, switching power, delay and power delay product (PDP).

**E. Jebamalar Leavlin et al. in [17]** have informed that discrete Cosine Transform (DCT) is broadly utilized in image and video compression standards. This paper



presents low-power co-ordinate rotation digital computer (CORDIC)based reconfigurable architecture for discrete cosine transform(DCT). All the computations in DCT are not similarly significant in creating the frequency domain yield.

**K.Kalyani et al. in [18]** have examined that quick Fourier Transform (FFT) is quite possibly the main algorithm in signal processing and correspondences and is utilized in orthogonal frequency division multiplexing(OFDM) frameworks.

After review of different existing work taken in the DCT, numerous problems have been figured. Real time information processing requires the utilization of particular purpose hardware which includes hardware efficiency just as high throughput. Those architectures which includes multipliers, for instance Chen's algorithm has less ordinary architecture because of complex routing and requires enormous silicon area. To eliminate or optimize the limit of discrete cosine transform plan many exploration objectives are proposed and presented in this report. The great objective of this paper is to plan an effective discrete cosine transform utilizing based modified CORDIC it includes an uncommon serial number shuffling unit having three development registers, three adders/sub farm haulers, Look-Up table and phenomenal interconnections. Looking at and organizing the presentation of different geography dependent on multiplier and DCT analysis by ascertaining cuts, number of LUTs, input/yield limited, maximum combinational path delay (MCPD). To decrease delay and cuts (area) in DCT architecture dependent on modified CORDIC algorithm.

### 4. DISCRETE COSINE TRANSFORM

Discrete cosine transform (DCT) is the most widely utilized transformation technique in the field of signal processing especially in the areas of image data. As multimedia applications on portable low-power devices become more conspicuous, the requirement for proficient low force image encoding and decoding techniques increments. The utilization of DCT has been concentrated broadly and the comparing algorithms developed are related with various complex additions and multiplications. So the reasonable acknowledgment of the mind boggling portrayal sets aside a great deal of effort to understand the real hardware execution. As an outcome, there is a need to develop a calculation for effective execution of DCT on hardware. Since the suggestion of DCT by [1], a ton of savvy and simple to execute algorithms have been developed. These algorithms are either founded on direct methodology or indirect technique to register DCT. The indirect methodologies include computation of DCT utilizing discrete Fourier transform (DFT) and the direct strategies depend on decomposition of direct DCT condition [6].

The Discrete Cosine Transform (DCT) was first proposed by Ahmed et al. (1974), and it has been an ever increasing number of significant lately. DCT has been widely utilized in signal processing of image data, particularly in coding for compression, particularly in lossy compression, for its close ideal exhibition. The Cosine Transform performs just the cosine-game arrangement improvement.

1-D sequence of length N is clearly shown in the algorithm below :

```
Input: binary sequence s = s_0, s_1, \ldots, s_{n-1} of length n
Output: linear complexity 0 \le L(s^n) \le n
```

```
begin
   C(x) = 1
   L = 0
   m = -1
    B(x) = 1
    N = 0
    while (N < n)
       d = s_N \oplus \sum_{i=0}^{m-1} c_i s_{N-1-i} (computes the next discrepancy)
       if (d = 1)
          T(x) = C(x)
          C(x) = C(x) + B(x) \cdot x^{N-m}
          if L \leq N/2
            L = N + 1 - L
            m = N
            B(x) = T(x)
          end if
       else (N = N + 1)
       end if
   end while
```

end

## **5. DCT ARCHITECTURE**

As we can observe in the Figure 4 that there are total eight inputs from x(0) to x(7) and corresponding to these inputs, the DCT transform may be implemented using 22 multiplications using 28 adders.



Fig 4: Example of the 8-Point 1D- DCT structure.



Just the estimations of function will change in each subsequence. This is significant property, since it shows that the essential functions can be pre-determined disconnected and afterward duplicated with the sub-sequences. This decreases the quantity of mathematical activities along these lines giving computation efficiency.

## 6. CORDIC ALGORITHM & PRINCIPLES

Figure 5 presents the modified one dimension discrete cosine transform in the straightforward form. show the 1-D DCT structure in straightforward form is utilized 4 multiplier and 2 expansion is supplant by 3 multiplier and 3 expansion and expansion and subtractor block is supplant by single block. In modified discrete cosine transform block are utilized to 15 adder, 15 subtractor and 10 multiplier.



The simplest form of CORDIC is based on an observation, in which if a unit length vector with at (x,y)=(1,0) is rotated by an angle  $\alpha$  degrees, its new end points will be at  $(x, y) = (\sin \alpha, \cos \alpha)$  thus coordinates can be calculated by finding the coordinates of new end points of the vector after rotation by an angle  $\alpha$ . The block diagram of the CORDIC processor is show in Figure 6.



Fig 6: CORDIC block diagram.

Basic equation of CORDIC algorithm  $x_{i+1} = x_i \cos(\alpha) - y_i \sin(\alpha)$  $y_{i+1} = y_i \cos(\alpha) + x_i \sin(\alpha)$ 

### 7. SIMULATION TOOL

VHDL (VHSIC-HDL) (Very high Speed Integrated Circuit) Hardware Description Language is a short sort of VHDL. It is a hardware depiction lingo that can be used to portray the design or conceivably direct of gear traces and to exhibit progressed frameworks. VHDL may be incredibly flexible, in light of its designing, permitting fashioners, electronic configuration computerization associations and the semiconductor business to investigate various roads in regards to new lingo measures to guarantee careful organization costs and bits of knowledge interoperability diagram properties showed up in Figure 7. Having planned the various DSP arrangements, we currently continue to the software synthesis of this plans the utilization of VHDL. in the accompanying areas, we have set up the supported filter out yields the utilization of isolated VHDL codes for each plan. The codes of the plans were demonstrated in the Appendix. The construction so become talked about effectively did or synthesized on XILINX ISE format suite 14.1i.



| Property Name                | Value                |   |
|------------------------------|----------------------|---|
| Product Calegory             | 8                    |   |
| Fandy                        | Virtex2 <sup>p</sup> |   |
| Device                       | XC2VP30              |   |
| Package                      | FF896                |   |
| Speed                        | -7                   |   |
| Top-Level Source Type        | HDL                  |   |
| Synthesis Tool               | XST (VHDL/Verlog)    |   |
| Simulator                    | XST (HDL/Verlog)     |   |
| Prefered Language            | Veritg 🔹             |   |
| Enable Enhanced Design Summ  | ay 🕅                 | - |
| Enable Message Filtering     | E1                   |   |
| Display Incremental Messages | - 四                  |   |

Fig 7: Xilinx ModelSim Simulation of New Project Wizard.

Libraries store significant data little scratches of codes that make coding a lot simpler for us. The main library that must be utilized in every program is called 'IEEE' and without this assertions will not be perceived by the simulators.

Library IEEE;

Use IEEE.STD\_LOGIC\_1164. ALL; - >> this contains essentially all the fundamental explanation definitions. Use IEEE.STD\_LOGIC\_AIRTH. ALL; - >> lets the client to perform capacities like information type change and stuff. Use IEEE.STD\_LOGIC\_UNSIGNED. ALL; - >> let us perform number-crunching procedure on STD\_LOGIC information types by regarding them as unsigned number. In this report, 6.11 and 14.21 Xilinx has been utilized for dissecting and the reenactment of the circuits. Both are having distinctive number of slices, IOBs, Memory and propagation delay and so on 14.21 is an updated rendition of the Xilinx software tool.

#### 8. PROPOSED ALGORITHM & RESULTS Proposed Algorithm : CORDIC Square Root Kernel

ki = 4; % the algorithm is ittretivly for repeated (3\*k + 1)

```
steps for idx = 1::n
```

or idx = 1:n xtmp = bitsra(x, idx); % multiply by 2^(-idx) ytmp = bitsra(y, idx); % multiply by 2^(-idx) if y < 0

$$\begin{aligned} x(:) &= x + ytmp; \\ y(:) &= y + xtmp; \\ else \\ x(:) &= x - ytmp; \\ y(:) &= y - xtmp; \\ end \\ if idx==ki \\ xtmp &= bitsra(x, idx); \% multiply by 2^{(-idx)} \\ ytmp &= bitsra(y, idx); \% multiply by 2^{(-idx)} \\ if y &< 0 \\ x(:) &= x + ytmp; \quad y(:) &= y + xtmp; \\ else \\ x(:) &= x - ytmp; \quad y(:) &= y - xtmp; \\ end \\ ki &= 3^*ki + 1; \\ end \\ end \% idx loop \end{aligned}$$

This algorithm for CORDIC is implemented in the MATLAB for plotting and calculating the Square root of the value.



Fig 8: The proposed CORDIC based Flow chart for calculating the phase angle.

This application presets the utilization of the CORDIC algorithm for computing the Sine and Cosine function and plotting the individual yield comparing to CORDIC rotation kernels . the case of the determined x and y values





utilizing the proposed rotation portion algorithm are

Fig 9: Example of calculation of sine and cosine functions using CORDIC Algorithm.





Application of DCT for the Speech compressing

The fundamental speech signal is plotted utilizing the use of the DCT cover the information synthetic speech signal. as an illustration in MATLAB to show the utilization of the speech compression.



Fig 11: Showing the example of the speech signal Compression.

| Name 📥           | Value            | Mi    |
|------------------|------------------|-------|
| Z_abs_cordic     | 7.6158 + 0.0001i | 7.6 🔨 |
| 🛨 Z_abs_ideal    | 7.6158           | 7.6   |
| Η Z_angle_cordic | 23.1979          | 23.   |
| Z_angle_ideal    | 23.1986          | 23.   |
| 🔒 alpha_k        | <32x1 double>    | 2.6   |
| err_abs          | 6.0291e-10       | 6.0   |
| 🛨 err_angle      | 7.2097e-04       | 7.2   |
| 🔒 k              | 15               | 15    |
| 🛨 real_Z         | <1x16 double>    | 10    |
| s                | 1                | 1     |
| 🛨 thetaHat       | -23.1979         | -23   |
| thetaHat_v       | <1x16 double>    | -45 🗸 |
| <                |                  | >     |

Fig 12: Calculated phase angle using CORDIC algorithm



The calculated phase angle using the mentioned algorithm is shown in the Figure 12.

# 9. COMPARATIVE RESULTANALYSIS

Table 1 shows the examination of number of register use in existing work and the proposed DCT execution .it can we saw from the table that the quantity of 16-digit move register are lessen from 42 to 34. We can likewise decreases 16-cycle adder subtractor from 42 to 34.

|--|

| Parameter      | Existing         | Proposed         |
|----------------|------------------|------------------|
|                | Architecture for | Architecture for |
|                | DCT              | DCT              |
| 16 bit Shift   | 42               | 34               |
| register       |                  |                  |
| 16 bit Adder – | 42               | 34               |
| Subtractor     |                  |                  |

Second sub block is multiplier is supplanted by CORDIC algorithm. Another segment utilized CSLA utilized as part. We have discrete cosine change utilizing CORDIC strategy and CSLA to accomplish great computation speed contrast with other existing algorithm as shows in Table 2. Relative chart is appeared in figure 13 and 14.

| Parameter    | Existing      | Modified      |
|--------------|---------------|---------------|
|              | Algorithm for | Algorithm for |
|              | DCT           | DCT           |
| No. of Slice | 1102          | 342           |
| Registers    |               |               |
| No. of Slice | 2541          | 1303          |
| LUTs         |               |               |
| No. of fully | 958           | 1409          |
| used LUT-FF  |               |               |
| pairs        |               |               |
| No. of IOBs  | 347           | 258           |
| Maximum      | 224.9MHz      | 168.245 MHz   |
| Frequency    |               |               |

We have made a principle code for characterizing the logical behavior of the three sub block: - Delay, multiplier and adder. There are the three separate codes for three sub block. In delay utilized part D flip-flop catches the estimation of the D-contribution at a clear bit of the clock cycle (like the rising edge of the clock) and 8-bit putting away the bits comprising of D-flip-flop



Fig 13: Bar Graph for the Comparison of Different DCT Architecture in terms of number of Registers.



Fig 14: Bar Graph of Comparison for various parameters.



## **10. CONCLUSION**

We introduced new scaling units for x and y information paths independently considering the commonsense gather scaling constants. The proposed estimation is word length dependent and dependent on prerequisite of exactness, the word length of the information paths can be differed. CORDIC unit with the proposed scaling units is executed for various scopes of sources of info and blunder examination is done, considering the word length of information path as 16 bits. The greatest frequency, at which the design can be actualized, in Xilinx FPGA, manufactured in 90 nm measure technology, is 75.593 MHz and its slice delay product is 106.465. We likewise have in this manner proposed a low hardware complex design to actualize DFT utilizing low latency scaled CORDIC. By investigating the symmetry properties of the transform, number of additions/subtractions has been limited in order to accomplish negligible power dissemination, insignificant area and improved latency. Systolic array engineering for the real-time DCT calculation may have the enormous number of gates and delay issue. We have executed of 8-point DCT utilizing adjusted CORDIC algorithm and ascertain the slice register, slice LUTs, IOBs and greatest frequency and to analyze all the boundaries. Two highly programmable, low-delay and proficient CORDIC algorithm was introduced, confirmed and contrasted with comparable logic structures previously distributed. The proposed altered CORDIC algorithm as the best existing CORDIC algorithm regarding delay, slice register and greatest frequency.

### **REFERENCES:-**

[1] Sudarshan TSB, Nikhita Raj J, ShikhaTripathi, "Systolic Architecture Implementation of 1D DFT and 1D DCT", IEEE 2015, pp 1-5.

[2] Prof. Bhaskar S.V, "Efficient Simulation of DCT architecture using CORDIC Algorithm", IJARST 2020, pp 11-17.

[3] Linbin Chen, Jie Han, Weiqiang Liu and Fabrizio Lombardi, "Algorithm and Design of a Fully Parallel Approximate Coordinate Rotation Digital Computer (CORDIC)", IEEE TRANSACTIONS, 2017, pp 1-13.

[4] Huijie Zhu, Yizhou Ge, Bin Jiang, "Modified CORDIC Algorithm for Computation of Arctangent with Variable Iterations", IEEE 2016, pp 261-264.

[5] Muhammad Nasir Ibrahim, Mariani Idroas, Chen Kean Tack, "Processing Discrete Cosine Transform using Coordinate Rotation Digital Computer Co Processor on Field Programmable Gate Array", International Journal of Advanced Science and Technology 2019, pp 176-183.

[6] Deboraj Muchahary, Abir J. Mandai, and Alak Majumder, "A CORDIC Based Design Technique for Efficient Computation of DCT", IEEE ICCSP 2015, pp 1-5.

[7] Deboraj Muchahary, Abir J Mondal, Rajesh Singh Parmar, Amlan Deep Borah, Alak Majumder, "A Simplified Design Approach for Efficient Computation of DCT", IEEE 2015, pp 483-487.

[8] Linbin Chen and Fabrizio Lombardi, Jie Han, Weiqiang Liu, "A Fully Parallel Approximate CORDIC Design", IEEE 2016, pp 1-6.

[9] Bharati Y Masram and PT Karule, "High Speed 3D-DCT/IDCT CORDIC Algorithm for DSP Application", European Journal of Advances in Engineering and Technology, 2017, pp 941-950.

[10] Namrata Sarode, Rajeev Atluri, P.K. Dakhole, "Mixed-Radix and CORDIC Algorithm for Implementation of FFT", IEEE ICCSP 2015, pp 1628-1634.

[11] Aarti Ranji Nashrah Fatima, Dr. Paresh Rawat, "Area Efficient VLSI Architecture for DCT using Modified CORDIC Algorithm", IEEE 2016, pp 1-4.

[12] Neelam Sharma, Vipul Agrawal, Sourabh Sharma, "Performance Analysis of 1-D DFT & 1-D DCT using CORDIC Algorithm", IJSRSET 2016, pp 957-961.

[13] SomayyehMohammadi, Kenman and Iran, "A Chaotic Watermarking Scheme using Discrete Cosine Transform", 978-1-4673-7609-9/15/\$31.00 ©2015 IEEE.

[14] K Ravi Kiran, Prof C Ashok Kumar and M Suresh Kumar, "Design and Analysis of A Novel High Speed Adder Based Hardware Efficient Discrete Cosine Transform (DCT)", 2015 Fifth International Conference on Advances in Computing and Communications.

[15] Teena Susan Elias and Dhanusha P B, "Area Efficient Fully Parallel Distributed Arithmetic Architecture for One-Dimensional Discrete Cosine Transform", 2014 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT) IEEE 2014.

[16] Muhammad H. Rais, "FPGA Design and Implementation ofFixedWidthStandard and Truncated 6×6-bit Multipliers: A Comparative Study", 978-1-4244-5750-2/10/\$26.00 ©2009 IEEE

[17] E. JebamalarLeavline, S.Megala and D.Asir Antony Gnana Singh, "CORDIC Iterations Based Architecture for Low Power and High Quality DCT", 2014 International Conference on Recent Trends in Information Technology 978-1-4799-4989-2/14/\$31.00 © 2014 IEEE.

[18] K.Kalyani, D.Sellathambi and S. Rajaram, "Reconfigurable FFT using CORDIC based architecture International Journal of Engineering Applied Science and Management ISSN (Online): 2582-6948 Vol. 2 Issue 3, March 2021



for MIMO-OFDM receivers", Thiagarajar College of Engineering, Madurai 2014.

[19] Esakkirajan G, Member IEEE and Annadurai C, "CORDIC Based High Speed DCT Algorithm", International Conference on Communication and Signal Processing, April 3-5, 2014, India.

[20] BhavitKaushik, Ravi Saini, Anil Saini, Sanjay Singh and A. S. Mandal, "An FPGA Implementation of Image Signature based Visual-Saliency Detection", PP. 01-05, 2014, India.

[21] Uma Sadhvi Potluri and Arjuna Madanayake, "Improved 8-Point Approximate DCT for Image and Video Compression Requiring Only 14 Additions", IEEE Transactions on Circuits and Systems—I: Regular Papers.

[22] Ulises S. Mendoza-Camaren and Romero-Troncoso, "VHDL Core for the Computation of the One-Dimensional Discrete Cosine Transform", 1-4244-0690-0/06/\$20.00 ©2006 IEEE.

[23] RamkrishnaSwamy, MaziyarKhorasani, Yongjie Liu, Duncan Elliott and Stephen Bates, "A Fast, Pipelined Implementation of a Two-Dimensional Inverse Discrete Cosine Transform", IEEE 2005 pp 1-7.

[24] J. Han and M. Orshansky, "Approximate computing: An emerging paradigm for energy-efficient design," in Proc. 18th IEEE European Test Symposium (ETS), 2013, pp. 1-6.

[25] H. Jiang, J. Han, and F. Lombardi, "A Comparative Review and Evaluation of Approximate Adders," in Proc. 25th Great Lakes Symposium on VLSI, Pittsburgh, Pennsylvania, USA, 2015, pp. 343-348.

[26] J. Liang, J. Han, and F. Lombardi, "New Metrics for the Reliability of Approximate and Probabilistic Adders," IEEE Trans. on Computers, vol. 62, pp. 1760-1771, 2013.

[27] M. J. Schulte and E. E. Swartzlander, "Truncated multiplication with correction constant," VLSI Signal Processing VI, pp. 388-396, 1993.

[28] V. Gupta, D. Mohapatra, A. Raghunathan, and K. Roy, "Low- Power Digital Signal Processing Using Approximate Adders," IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, vol. 32, pp. 124-137, 2013.

[29] J. E. Volder, "The CORDIC Trigonometric Computing Technique," IRE Trans. on Electronic Computers, vol. EC-8, pp. 330-334, 1959.

[30] J. S. Walther, "A unified algorithm for elementary functions," in Proc. Spring Joint Computer Conference, Atlantic City, New Jersey, 1971, pp. 379-385.

[31] L. Chen, J. Han, W. Liu, and F. Lombardi, "A Fully Parallel Approximate CORDIC Design," in Proc. ACM/IEEE Symposium on Nano Architectures, Beijing, 2016, pp. 197-202. [32] D. Timmermann, H. Hahn, and B. J. Hosticka, "Low latency time CORDIC algorithms," IEEE Trans. on Computers, vol. 41, pp. 1010-1015, 1992.

[33] T. Srikanthan and B. Gisuthan, "A novel technique for eliminating iterative based computation of polarity of micro-rotations in CORDIC based sine-cosine generators," Microprocessors and Microsystems, vol. 26, pp. 243-252, 2002.

[34] B. Gisuthan and T. Srikanthan, "Pipelining flat CORDIC based trigonometric function generators," Microelectronics Journal, vol. 33, pp. 77-89, 2002.

[35] H. S. Kebbati, J. P. Blonde, and F. Braun, "A new semi-flat architecture for high speed and reduced area CORDIC chip," Microelectronics Journal, vol. 37, pp. 181-187, 2006.

[36] B. Lakshmi and A. S. Dhar, "CORDIC Architectures: A Survey," VLSI Design, 2010.

[37] T.-B. Juang, S.-F. Hsiao, and M.-Y. Tsai, "Para-CORDIC: parallel CORDIC rotation algorithm," IEEE Trans. on Circuits and Systems I: Regular Papers, vol. 51, pp. 1515-1524, 2004.

[38] S. Wang, V. Piuri, and E. E. Swartzlander, Jr., "Hybrid CORDIC algorithms," IEEE Trans. on Computers, vol. 46, pp. 1202-1207, 1997.