## Design of Energy Efficient Low Power Adder using Multi-mode Addition

<sup>1</sup>P.Sangeetha, <sup>2</sup>M.Thiruppathi, <sup>1</sup>PG Student [VLSI], <sup>2</sup>Assistant Professor, <sup>1</sup>Department of ECE, <sup>1</sup>Vivekanandha College of Engineering for Women, Namakkal,India

Abstract—Adders are usually designed for the worst case where their carry propagates through across the entire bits, those cases rarely occur at real operation. This work takes advantage of the infrequent worst-case occurrences by designing adders for the average case. Such design indicates that computation errors may occur. Those are being modified by implementing multi-mode addition with the help of a dedicated control circuit. A power-delay-energy model is presented, allowing to find the optimum design point. We demonstration for that situations where the system's critical paths are prescribed by the adders, the system's operation voltage can be scaled, without harming the clock cycle and with extremely small performance degradation. The multi-mode adder has been integrated in a 32-bit pipelined MIPS processor, proving the correctness of such design methodology.

IndexTerms—low power design, multi-mode adders, voltage scaling.

**I.INTRODUCTION** 

# The design of adders assumes that addition must be accurately accomplished in a prescribed period, independently of the input operands. Subsequently, the worst-case of carry propagation along the entire bits is targeted. When high performance is desired, preface and Carry Look-Ahead (CLA) adder architectures which consume large area and energy are used. With the spread of mobile and green computing the focus of adder design is on low energy. The most significant factor in adder optimization is its carry propagation possibilities. It was used to

on low energy. The most significant factor in adder optimization is its carry propagation possibilities. It was used to estimate the adder's energy consumption in [4, 5]. We present an architecture called multi-mode, where the adder is designed for the expected longest carry slightly than the worst-case. The addition completes within a single clock cycle with minimum energy for most cases, while with anextremely small probability one or more cycles may be required.

Precision, performance and energy of adders were traded off by several works. Energy savings in adders can be completed by sacrificing accuracy in applications that can allow it. In [6] it was proposed to reduce the logic density of a full adder at the transistor level and reduce the statistical accuracy in a design of multibit adders. In addition to the essential reduction in switched capacitance, the method resulted in important reduction of the critical paths, thus enabling voltage scaling. The approximate adder was used for video and image density, achieving upto 69% power savings associated to accurate adders. In accurate processing has also been used for multiplication [7]. It accomplished average power savings of 32–45% over consequent accurate multiplier designs, with averageerrorsof1.4–3.3%.

Unlike [6] and [7] which pay in precision, a technique called Razor Design [8] pays by small performance degradation. Voltage is scaled down so that most of resonant the timecausing logic safely accomplishes its computation. Our technique is rather resonant razor in the sense that it aims the average case of propagation delay slightly than the worst-case. It is considerably different though. Firstly, razor specifies whether all the critical outputs of a logic stage have completed their proper evaluation within the given clock cycle, by increasing the pipeline registers with complementary latches and comparison logic. Our technique is further more scan since it is personalized precisely for addition circuits. The exposure takes place at the logic stage rather than at the sampling registers, as razor does. Secondly, razor does not moderate the basiclogic circuits, since it is not aware of their specific structure. Our technique in contrast, depend on the specific properties of carry propagation, running to a change in the adder's architecture. Finally, razor provides a small performance for power savings, but does not

tolerate in accurate computation. Our method allows in accurate addition n mode (with very low probability), for special purpose application that can allow such inaccuracies.

In an effort to increase the overall throughput, a technique called telescopic unit [9] took advantage of data consequences on the worst-case carry probability. The development is achieved by speeding up the clock signal, such that its cycle serves for common input cases. This paper extends the idea further into multi-mode by using simple carry-completion control logic. Its energetic and performance advantages and demonstration good matching between the speculative energy-performance modeling and circuit simulations. It is exposed how different processors architectures can influence of multi-mode addition and how the probabilistic model matches with results achieved from real code running on a 32-bit MIPS processor.

The rest of the paper is organized as follows. Section iioverviews prior work related to the adder proposed in this paper. Section iiidescribes the proposed multi-mode adder's architecture, Section iii. (a)Describes how the multi-mode adder enables voltage scaling. In Section iii. (b)The carry propagation probability and multi-mode addition that is the basis of the analysis, is presented. Energy consumption is analyzed in Section iii. (c). Section ivpresents results and discussions Section v concludes the discussion.

### II. EXISISTING METHOD

The existing adder is designed for O (log2n) carry-chains length, but must still handle properly longer carry-chains. To this end we need to detect whether or not the longest carry-chain occurring in an addition exceeds a confidentlimit of k bits. If it does, [10] the adder must modify its operation and allocate more time so that a correct result is produced. We subsequentlyaddress these issues in our existing n-bit adder architecture, called dual-mode adder. It will target k-bit carry propagation inits mostpossible operation mode called normal, while another mode where additional clock cycles are required to properlycomplete the addition is called extended.

A dual-mode adder is working either in a single clock cycle, called normal-mode, or in m cycles, called extended-mode. A single clock cycle suffices when the longest carry does not exceed k bits. Otherwise, the adder will use more m clock cycles to complete properly its computation. The dual-mode adder targets the delay of a k-bit group relativelythan the worst-case of n = mk bits, which ordinary adders do. In its most likely normal mode it will uselow power and energy. The extended mode will take place in those few cases where the carry propagates across more than k bits, where it will consume more time, power and energy.

### III. PROPOSED MULTI-MODE ADDER ARCHITECTURE

The operation of a multi-mode adder in a real processor running test programs is tested. To this end were



placed the ALU's ordinary adder of a 32-bit MIPS pipelined processor with a multi -mode one.

Fig. 1 Block diagram of multi-mode adder embedded in a pipelined MIPS processor

Figure 1 depicts how a multi-mode adder is integrated in the MIPS's ALU. That requires only replacing the existing adder by a multi-mode one and introducing a clock gater as described below. Notice that the registers in Figure 1 are not new, but parts of the ordinary pipeline registers, storing ALU's operands and result. A and B are a part of the Instruction Decode/Execute pipeline register, while SUM is a part of the Execute/Memory access pipeline register. The operation SUM=A+B starts when the arguments are loaded into the registers A and B. The mode decision moduleuses carry-completion circuit, operating simultaneously with the adder. The signal "all\_done" is gating the global clock signal "global-CLK".

If a normal node is validated, a case where "all\_done" is asserted within a single cycle (most often), the clock gater keep onthe global clock, and the result will be loaded into SUM after one cycle. If more cycles are required for "all\_done" assertion, "Pipeline\_CLK" is delayed by asuitablenumber of clock cycles, allowing the adder to properly complete. "Pipeline\_CLK" is synchronizing the entire pipeline, so all the pipe is delayeddelayeduntil the addition completes. Other adders, e.g. those used for address calculations, can equally be replaced by multi-mode ones. That needsadditional gating level to AND the individual "Pipeline\_CLK" signals the clock is deactivated in order. Other than the adder, no logic signal is switching during the extra cycle of sub-normal addition mode. Hence dynamic energy is not wasted.

### A.ENERGY SAVING IN MULTI-MODE ADDERS BY VOLTAGE SCALING

A 64-bit carry propagate adder (CPA) designed in 65-nano meter TSMC process technology, operated in 1.3V supply voltage. It is well known that the expected longest carry propagation in n-bit addition is O ( $\log_2 n$ ). Rather than considering the carry propagation long 64 bits, we divide the adder into several k-bit groups. We found by SPICE simulation show far can the supply voltage be scaled, while a k-bit group still properly computes. The operands were set such that a carry in pulse at the group's LSB will propagate through its entire bits.

The k-bit group includes a pass gate carry propagation chain combined with per bit addition circuit. The corresponding energies were obtained by integrating the current voltage product over time. It is exposedlater how the multi-mode adder takes advantage of voltage scaling from 1.3V to 0.95V, yielding considerable energy reduction. Those are later required to compute the energy consumed by an addition operation as a part of the absolutecomputing system, where it is integrated in a pipelined processor.

### B.CARRY PROPAGATION PROBABILITIES AND MULTI MODE ADDITION

Dual-mode adder architecture divides an n-bit CPA into m = n / k independent groups of k-bit each using a fast pass gate carry chain circuit architecture. As shown subsequently, the optimal values of k used in multi-mode adders does not exceed ten. For such size, fast circuit architecture; such as carry-skip or prefix tree adders do not produceany speed advantage over pass gate carry chain.

The additions are properly computed with high probability within a single clock cycle. A dual-mode adder is working either in a single clock cycle, called normal-mode, or in m cycles, called extended-mode. A single clock cycle suffices when the longest carry does not exceed k bits. Otherwise, the adder will use more m-1 clock cycles to complete properly its computation. The probability  $p_{norm}$  that the longest carry does not exceed k bits at any of the m groups, a case where a single clock cycle suffices, is

$$p_{norm}(k,m) = (1-2^{-k})^m = 1-m2^{-k} + O(m^2 2^{-2k}) > 1-m2^{-k}$$
 (1)

The disadvantages of dual-mode adder which spends m cycles for its extended mode, a multi-mode improves the extra cycles by using a simple carry completion supplementary logic. Such logic validates when all the adder's bits have been already received their proper carry in signal, thus ensuring proper addition result. We call subnormal mode where proper addition requires two clock cycles. The probability that a carry is generated or killed within each of the m/2 2k-bit groups is

$$(1-2^{-2k})^{\frac{m}{2}} = 1 - m2^{-(2k+1)} + 0 (2^{-4k}) > 1 - m2^{-(2k+1)}$$
 (2)

The probability p<sub>sub</sub> –<sub>norm</sub> of sub-normal mode to occur is therefore

$$p_{\text{sub-norm}} (k,m) = (1-2^{-2k})^{m/2} - p_{\text{norm}} (k,m)$$
$$= (1-2^{-2k})^{m/2} - (1-2^{-k})^{m} m 2^{-k}$$
(3)

### C. MODELLING AND ANALYSING THE ENERGY CONSUMPTION

Let us consider the total energy  $E_{tot}$  consumed by a multi-mode adder, integrated in a computing system. While the dynamic energy depends only on the logic switching, and hence is independent of (k,m), the static energy depends. It follows from (1) and (3) that the adder's total energy consumption is

$$E_{tot} = p_{norm} E + p_{sub-norm} [(1-\lambda) E + 2\lambda E]$$

$$= E (1+\lambda p_{sub-norm})$$
(4)

Energy is not the only concern. Recall that multi-mode adder spends with some probability more than one cycle to compute properly. We therefore consider the product of the expected computation time with the energy. Denoting by T the clock cycle and by  $T_{ave}$  the average addition time, the energy-delay product is

$$E_{\text{tot}} T_{\text{ave}} \simeq E \left(1 + \lambda p_{\text{sub}-\text{norm}}\right) T \left(p_{\text{norm}} + 2 p_{\text{sub}-\text{norm}}\right)$$
 (5)

The parameters  $V_{dd}$  and k are in contrast design choices. Let us therefore find the dependency of  $V_{dd}$  on k such that the carry complete sits propagation through a group within a single clock cycle. The intrinsic delay t of a unit-size CMOS pass-gate transistor is

$$t = \frac{\alpha V_{dd}}{(V_{dd} - V_{th})^2} \tag{6}$$

The propagation delay t<sub>k</sub> of a pass-gate carry-chain grows quadratic with k, yielding

$$t_{k} = k^{2} \frac{\alpha V_{dd}}{(V_{dd} - V_{th})^{2}}$$
 (7)

Ignoring register's setup and propagation delays, given a clock cycle  $t_k$ , one can derive the smallest supply voltage required to operate the carry-chain transistors such that the propagation delay will not exceed  $t_k$ . Solving (7) for  $V_{dd}$  yields

$$V_{dd} = V_{th} + \frac{\alpha k^2}{2t_k} + \sqrt{\frac{\alpha V_{th} k^2}{t_k} + \frac{\alpha k^4}{4t_k^2}}$$
 (8)

The knowledge of  $V_{dd}$  from (8) allows to derive the dynamic energy consumed by the adder, expressed by  $E_{dyn} = CV_{dd}^2$ , where C is the adder's underlying total capacitance. The total energy  $E_{tot}$  can be found from  $E_{dyn}$ ,  $\lambda$  and  $p_{sub-norm}$ . The term (1- $\lambda$ )E in (5) is the dynamic energy  $E_{dyn}$ , whereas  $\lambda E$  is the static energy consumed during a single clock cycle, expressed by  $\lambda E = \lambda E_{dyn} / (1-\lambda)$ . Recalling that m=n/k, substitution in (4) yields

$$E_{\text{tot}} = C(V_{\text{th}} + \frac{\alpha k^2}{2t_k} + \sqrt{(\frac{\alpha V_{\text{th}} k^2}{t_k} + \frac{\alpha k^4}{4t_k^2})^2}$$

$$*\{(1-2^{-k})^{m/k} + [1-(1-2^{-k})^{m/k}](\frac{n\lambda}{k} + 1 - \lambda)\}$$
(9)

### IV. RESULTS AND DISSCUSSIONS

To carry out the energy performance study of the multimode adder, a 65 nm CMOS technology using 64-bit design ware adder has first been synthesized to 500MHz and nominal 1 V supply voltage with LTspice Tool shown in fig.2. That adder is used as a reference for alternative multimode adders. That adder is used as a reference for alternative multimode adders. It is shown that all could meet that frequency with large time margins. The time

margins are used to reduce the supply voltage  $V_{dd}$  such that the switching point will move rightwards to the 2.0nSec point. The quadratic growth of the delay with group size, occurring by the group's pass-gate carry-chain, as expressed in (7). Whenever the latch output is 1 the pipeline\_clk is generated. Otherwise, the output should be zero.



Fig. 2 Output for multi-mode adder

Designed a 64-bit adder with 8-bit groups, such that those meet 500MHz clock frequency with smallest possible supply voltage. The energy comparison must account for the sub-normal mode and extended modes, where more than one cycle is required for proper addition, and thereforeadditional static power is consumed. To find the energy efficiency of the multimode addition compared to the ordinary addition the consistentenergies (power multiplied by the clock cycle) are substituted in (4) and the normalized by the energy consumed by the synthesized adder. Whenever the latch output is 1 the pipeline\_clk is generated. Otherwise, the output should be zero.

| MULTI-MODE ADDER |               |            |           |
|------------------|---------------|------------|-----------|
|                  | WOLTI-WODE AT | DER        | )         |
| SUPPLY VOLTAGE   | POWER(mW)     | ENERGY(fj) | DELAY(ps) |
|                  | 1.1001        | 1.10.1     | 21112     |
| 1.3              | 4.1901        | 4.184      | 916.48    |
| 1                | 1.2744        | 1.292      | 873.58    |
|                  |               |            | K         |
| 0.95             | 1.2150        | 1.853      | 785.44    |

Table 1 Analysis of multimode adder with different supply voltages

The multi-mode adder, using 64-bit carry propagate adder designed for 1.3 v, 1v and 0.95v supply voltages. All the designs ran LTspice and HSPICE tools and results are summarized in Table 1.The energy are very close to the power since the probability of the adder to require more than one cycle is very small. While the energy was cut to 1.85 of the design-ware adder when scaling down to 0.95V,that might be too risky due to process variations sensitivity.

When the supply voltage can be scaled down, the power and energy has been reduced.

## V. CONCLUSION AND FUTURE WORK

The work had shown how energy efficiency can be achieved by taking advantage of the infrequent worst-case carry propagation occurring in addition. When number of bits reduced the voltage can be scaled down, so that the potential energy savings was studied. The validity of the multimode addition approach has been shown in the design of a 32-bit pipelined MIPS processor. Multimode addition can also be very useful in special architectures such as image processors. Based on data the adder is active using a mode decision logic and clock gating the power will be reduced. It is stimulating to study the implications of the subnormal mode probability on the energy and

performance of such pipelines. In future, to get better results i.e., low power consumption, energy and reduce delay using CMOS 65 nm technology instead of FINFET 32 nm technology is used.

### REFERENCES

- [1] Benini L., Macii E., Poncino M. and DeMicheli G. (1998), 'Telescopic units: a new paradigm for performance optimization of VLSI designs', IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol.17, no.3, pp.220–232.
- [2] Chen Y., Li H, Koh C.K., Sun G., Li J., Xie Y. and Roy K. (2010), 'Variable latency adder (VL-adder) designs for low power and NBTI tolerance', IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol.18, no.1, pp.1621–1624.
- [3] Das S., Dasika G.S., Shivashankar K. and Bull D. (2014), 'A 1 GHz hardware loop-accelerator with razor based dynamic adaptation for energy efficient operation', IEEE Trans. Circuits Syst.I: Regul. Pap. vol.61, no.8, pp.2290–2298.
- [4] Ernst D., Das S., Lee S., Blaauw D., Austin T., Mudge T., Kim N.S and Flautner K. (2004), 'Razor: circuit level correction of timing errors for low-power operation', IEEE Micro., vol. 6, pp. 10–20.
- [5] Freking R.A. and Parhi K.K. (1998), 'Theoretical estimation of power consumption in binary adders, in Proceedings of the IEEE International Symposium on Circuits and Systems'.
- [6] Gupta V., Debabrata M., Anand R. and Kaushik R. (2013), 'Low-power digital signal processing using approximate adders', IEEE Trans. Comput. –Aided Des. Integr. Circuits Syst., vol. 32, no. 1, pp. 124-137.
- [7] Koren I. (2002), Computer arithmetic algorithms, Universities Press.
- [8] Kulkarni P., Gupta P. and Ercegovac M. (2011), 'Trading accuracy for power with an under designed multiplier architecture', in Proceedings of the 24<sup>th</sup> International Conference on VLSI Design, vol.0, pp.346-351.
- [9] LaGuiadeSolaz M. and Conway R. (2015), 'Razor based programmable truncated multiply and accumulate, energy reduction for efficient digital signal processing', IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol.23, no.1 189–193.
- [10] Wimer S., Albeck A. and Koren I. (2014), A low energy dual-mode adder', Comput. Electr. Eng., vol.40, no.5, pp.1524–1537.

