Monday 4 September 2023

CLOCK TREE SYNTHESIS - PART3

 CLOCK BUFFER AND MINIMUM PULSE WIDTH VIOLATION


Transition (slew): A slew is defined as a rate of change. In STA analysis the rising or falling waveforms are measured in terms of whether the transition(slew) is fast or slow. Slew is typically measured in terms of transition time, i.e. the time it takes for a signal to transition between two specific levels ( 1 to 0 or 0 to 1/ low to high or high to low). Transition time is inverse of the slew rate- the larger the transition time, the slower the slew and vice-versa.


In lib these transition is defined as:
#rising edge threshold:
Slew_lower_threshold_pct_rise : 20.0;
Slew_upper_threshold_pct_rise : 80.0;
#falling edge threshold:
Slew_upper_threshold_pct_fall : 80.0;
Slew_lower_threshold_pct_fall : 20.0;
These values are specified as a % of Vdd.
Rise time: The time required for a signal to transition from 20% of its (VDD) maximum value to 80% of its maximum value.

Fall time: The time required for a signal to transition from 80% of its (VDD) maximum value to 20% of its maximum value.

Propagation delay: The time required for the signal to change the inputs to its state like 0 to 1 or 1 to 0.

Clock buffer and normal buffer
Clock net is a high fan-out net and most active signal in the design. Clock buffer mainly used for clock distribution to make the clock tree. The main goal of CTS to meet skew and insertion delay, for this we insert buffer in the clock path. Now if the buffer has different rise and fall time it will affect the duty cycle with this condition tool can do skew optimization but complicates the whole optimization process as a tool has to deal with a clock with duty cycle at different flop paths. If buffer delays are the same only thing the tool has to do balance the delay by inserting buffer.

The clock buffers are designed with some special property like high drive strength, equal rise and fall time, less delay and less delay variation with PVT and OCV. Clock buffer has an equal rise and fall time. This prevents the duty cycle of clock signal from changing when it passes through a chain of clock buffers.

A perfect clock tree is that gives minimum insertion delay and 50% duty cycle for the clock. The clock can maintain the 50% duty cycle only if the rise and the fall delays and transition of the tree cells are equal.

How to decide whether we need to used buffer or inverter for building a clock tree in the clock tree synthesis stage. This decision totally depends on the libraries which we are using. The main factors which we consider to choose inverter or buffer are rise delay, fall delay, drive strength and insertion delay (latency) of the cell. In most of the library files, a buffer is the combination of two inverters so we can say that inverter will be having lesser delay than buffer with the same drive strength. Also inverters having more driving capacity than a buffer that’s why most of the libraries preferred inverter over buffer for CTS.

Clock buffers sometimes have input and output pins on higher metal layers much fewer vias are needed in the clock distribution root. Normal buffer has pins on lower metal layers like metal1. Some lib also has clock buffers with input pins on high metal layers and output pins on lower metal layers. Normally clock routing is done into higher metal layers as compared to signal routing so to provide easier access to clock pins from these layers clock buffer may have pins in higher metal layers. And for normal buffer pins may be in lower metal layers.

Clock buffer are balanced i.e. rise and fall time almost the same. If these are not equal then duty cycle distortion in the clock tree will occur and because of this minimum pulse width violation comes into the picture. In clock buffer the size of PMOS is greater than NMOS.

On the other hand normal buffer have not equal rise and fall time. In other words they don’t need to have PMOS/NMOS size to 2:1 i.e. size of PMOS don’t need to be bigger than the NMOS, because of this normal buffer is in a smaller size as compared to clock buffer and clock buffer consumes more power.

The advantage of using an inverter-based tree is that it gives equal rise and fall transition so due to that jitter (duty cycle jitter) get canceled out and we get symmetrical high and low pulse width.

Buffer contain two inverters with unequal size in area and unequal drive strength. First inverter is of small size having low drive strength and the second buffer is of large size having high drive, strength are connected back to back as shown in figure below.

So a load of these two inverters are unequal. The net length b/w two back to back inverter is small so small wire capacitance will present here we can neglect that but for the next stage the net length is more and because of net length the capacitance is more by wire capacitance and next inverter input pin capacitance and we get unequal rise and fall time so jitter will get added in clock tree with an additional cost of more area than an inverter.

So mainly we are preferred inverter-based trees instead of the buffer based.
inverter based tree having equal rise and fall time

buffer based tree having unequal rise and fall time

Why PMOS is having bigger size than NMOS?

We know NMOS have majority charge carriers are electrons and PMOS have majority charges carriers are holes. And we also know that electrons are very much faster than holes.
Since electron mobility is greater than the hole mobility, so PMOS width must be larger to compensate and make the pull-up network more stronger. If W/L of PMOS is the same as NMOS the charging time of the output node would be higher than the discharging time because discharging time is related to the pulldown network.
So we make PMOS is of big size so that we can get equal rise and fall time.
Normal buffer are designed with W/L ratio such that sum of rise and fall time is minimum.


Normally (R) PMOS > (R) NMOS

(R) PMOS =3*(R) NMOS
For making equal resistance of both transistor the size of PMOS is bigger than the NMOS.



The duty cycle of clock:
It is the fraction of one period of the clock during which clock signal is in the high (active) state. A period is the time it takes for a clock signal to complete an on-and-off state. Duty cycle (D) is expressed in percentage (%).

Minimum Pulse width violation:
It is important for the clock signal to ensure the proper functionality of sequential and combinational cells. Ensure that the width of the clock signal is wide enough for the cell, internal operation i.e. minimum pulse width of the clock has to be maintain for proper output otherwise, the cell will go into metastable state and we will not get the correct output.

In other words clock pulse into the flop/latch must be wide enough so that it does not interfere with the correct functionality of the cells.

Minimum pulse width violation checks are to ensure that the pulse width of the clock signal for the high and low duration is more than the required value. 

Basically this violation is based on what frequency of operation and Technology we are working. If the frequency of design is 1 GHz then the time period for each high and low pulse will be 0.5ns as if we consider the duty cycle is 50%.
Normally we saw that in most of design duty cycle always keep 50% for the simplicity otherwise designer can face many issues like clock distortion and minimum pulse width violation. If in our design is using half-cycle path means data is launch at the positive edge and capturing at the negative edge and again minimum pulse width as rising level and fall level will not be the same and if lots of inverter and buffer will be in chain then it is possible that pulse can completely vanish.
Normally for the clock path, we use clock buffer because they have equal rise and fall delay of these buffer as compare to normal buffer having unequal delay that’s why we have to check minimum pulse width.
Why the minimum pulse width violation occurs:
Due to unequal rise and fall delay of combinational cell. Let’s take an example of buffer and clock signal having 1 GHz frequency (1ns period) is entering into a buffer. So for example, if the rise delay is more than the fall delay than the output of clock pulse width will have less width for high level than the input clock pulse.

The difference b/w rise and fall time is: 0.007
High pulse: 0.5-0.006=0.494
Low pulse: 0.5+0.006=0.506
We can understand it with an example:-
Let’s there is a clock signal which is pass through more numbers of buffers with different rise and fall delay time.  We can calculate how it effects to the low or high pulse of the clock signal. The width of clock signal is decreasing when buffer delay is more than the pulse width.
As we know every buffer in the chain is taking more time to charge than to discharge. When the clock signal is propagating through a long chain of buffers, the pulse width is reduced as shown below.
We can understand by the calculation:-

High pulse width = half pulse width of clock signal– (rise delay –fall delay)
                                = 0.5 - (0.055-0.048) - (0.039-0.032) - (0.025-0.022) - (0.048-0.043) - (0.058-0.054) = 0.474ns

Low Pulse width = half pulse width of clock signal + (rise delay –fall delay)
                               = 0.5 + (0.055–0.048) + (0.039–0.032) + (0.025–0.022) + (0.048 – 0.043) + (0.058 – 0.054) = 0.526ns

Let’s required value of Min pulse width is 0.410ns, Uncertainty = 90ps
Then high pulse width = 0.474-0.090 = 0.384ns
The slack is 0.384-0.410= - 0.026ns

here we can see that we are getting min pulse width violation for high pulse as total high pulse width is less than the required value.

If uncertainty we did not consider then violation will not occur in this scenario.

How to correct if violations are present in design:
We need to change the clock tree cells which have equal rise and fall delay time or use those cells they have less difference between rise and fall delays.

What are the problems occurs if pulse width violation occurs:
  • Sequential data might not be captured properly, and flop can go into a metastable state.
  • In some logic circuits the entire pulse could disappear and does not capture any new data.

So it is required to ensure every circuit element always gets a clock pulse greater than minimum pulse width required then only violation will not occur in the design.

There are two types of minimum pulse width checks are performed:
Clock pulse width check at sequential devices
Clock pulse width check at combinational circuits

How to report:
report_timing –check_type pulse_width

How to define pulse width:
By liberty file (.lib):
By default all the registers in the design have a minimum pulse width defined in .lib file as this is the format to convey the std cell requirement to the STA tool.
By convention min pulse width is defined for the clock signal and reset pins.
Command name: min_pulse_width

In SDC file (.sdc):
set_min_pulse_width -high 5 [get_clock clk1]
set_min_pulse_width -low 4 [get_clock clk1]
If high or low is not specified then constraints applied to both high and low pulses.

NOTE:
Balanced buffers means buffer having equal rise and fall time.
Unbalanced buffers means buffer having unequal rise & fall time

No comments:

Post a Comment