OpenROAD icon indicating copy to clipboard operation
OpenROAD copied to clipboard

CTS produces clock trees with large max delay and large skew

Open antonblanchard opened this issue 3 years ago • 10 comments

I've been struggling with timing issues when taping out Microwatt on sky130. Part of the issue appears to be CTS which is producing clock trees with large max delay and large skew. An example is attached:

openroad run.tcl

Looking at the stats for user_clock2:

Clock user_clock2
Latency      CRPR       Skew
_138944_/CLK ^
   8.01
_132706_/CLK ^
   6.10     -0.13       1.78

8ns of max delay and 1.8ns of skew. If you look one of the paths through the clock tree, there are a whole lot of buffers:

Fanout     Cap    Slew   Delay    Time   Description
-----------------------------------------------------------------------------
                          0.00    0.00   clock user_clock2 (rise edge)
                          0.00    0.00   clock source latency
                  0.61    0.48    0.48 ^ user_clock2 (in)
     1    0.14                           user_clock2 (net)
                  0.62    0.00    0.48 ^ repeater12/A (sky130_fd_sc_hd__buf_12)
                  0.29    0.36    0.84 ^ repeater12/X (sky130_fd_sc_hd__buf_12)
     1    0.33                           net621 (net)
                  0.63    0.29    1.13 ^ clkbuf_0_user_clock2/A (sky130_fd_sc_hd__clkbuf_16)
                  0.06    0.28    1.40 ^ clkbuf_0_user_clock2/X (sky130_fd_sc_hd__clkbuf_16)
     2    0.02                           clknet_0_user_clock2 (net)
                  0.06    0.00    1.40 ^ clkbuf_1_1_0_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
                  0.06    0.13    1.53 ^ clkbuf_1_1_0_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
     1    0.01                           clknet_1_1_0_user_clock2 (net)
                  0.06    0.00    1.53 ^ clkbuf_1_1_1_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
                  0.06    0.13    1.66 ^ clkbuf_1_1_1_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
     1    0.01                           clknet_1_1_1_user_clock2 (net)
                  0.06    0.00    1.66 ^ clkbuf_1_1_2_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
                  0.06    0.13    1.79 ^ clkbuf_1_1_2_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
     1    0.01                           clknet_1_1_2_user_clock2 (net)
                  0.06    0.00    1.79 ^ clkbuf_1_1_3_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
                  0.06    0.13    1.92 ^ clkbuf_1_1_3_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
     1    0.01                           clknet_1_1_3_user_clock2 (net)
                  0.06    0.00    1.92 ^ clkbuf_1_1_4_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
                  0.06    0.13    2.06 ^ clkbuf_1_1_4_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
     1    0.01                           clknet_1_1_4_user_clock2 (net)
                  0.06    0.00    2.06 ^ clkbuf_1_1_5_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
                  0.06    0.13    2.19 ^ clkbuf_1_1_5_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
     1    0.01                           clknet_1_1_5_user_clock2 (net)
                  0.06    0.00    2.19 ^ clkbuf_1_1_6_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
                  0.06    0.13    2.32 ^ clkbuf_1_1_6_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
     1    0.01                           clknet_1_1_6_user_clock2 (net)
                  0.06    0.00    2.32 ^ clkbuf_1_1_7_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
                  0.42    0.38    2.71 ^ clkbuf_1_1_7_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
     2    0.08                           clknet_1_1_7_user_clock2 (net)
                  0.43    0.02    2.73 ^ clkbuf_2_3_0_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
                  0.07    0.22    2.94 ^ clkbuf_2_3_0_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
     1    0.01                           clknet_2_3_0_user_clock2 (net)
                  0.07    0.00    2.95 ^ clkbuf_2_3_1_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
                  0.06    0.13    3.08 ^ clkbuf_2_3_1_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
     1    0.01                           clknet_2_3_1_user_clock2 (net)
                  0.06    0.00    3.08 ^ clkbuf_2_3_2_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
                  0.06    0.13    3.21 ^ clkbuf_2_3_2_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
     1    0.01                           clknet_2_3_2_user_clock2 (net)
                  0.06    0.00    3.21 ^ clkbuf_2_3_3_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
                  0.06    0.13    3.34 ^ clkbuf_2_3_3_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
     1    0.01                           clknet_2_3_3_user_clock2 (net)
                  0.06    0.00    3.34 ^ clkbuf_2_3_4_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
                  0.06    0.13    3.47 ^ clkbuf_2_3_4_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
     1    0.01                           clknet_2_3_4_user_clock2 (net)
                  0.06    0.00    3.47 ^ clkbuf_2_3_5_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
                  0.06    0.13    3.60 ^ clkbuf_2_3_5_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
     1    0.01                           clknet_2_3_5_user_clock2 (net)
                  0.06    0.00    3.60 ^ clkbuf_2_3_6_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
                  0.33    0.32    3.92 ^ clkbuf_2_3_6_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
     2    0.06                           clknet_2_3_6_user_clock2 (net)
                  0.33    0.01    3.94 ^ clkbuf_3_7_0_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
                  0.07    0.21    4.14 ^ clkbuf_3_7_0_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
     1    0.01                           clknet_3_7_0_user_clock2 (net)
                  0.07    0.00    4.14 ^ clkbuf_3_7_1_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
                  0.07    0.14    4.28 ^ clkbuf_3_7_1_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
     1    0.01                           clknet_3_7_1_user_clock2 (net)
                  0.07    0.00    4.28 ^ clkbuf_3_7_2_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
                  0.06    0.13    4.41 ^ clkbuf_3_7_2_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
     1    0.01                           clknet_3_7_2_user_clock2 (net)
                  0.06    0.00    4.41 ^ clkbuf_3_7_3_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
                  0.21    0.24    4.65 ^ clkbuf_3_7_3_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
     2    0.04                           clknet_3_7_3_user_clock2 (net)
                  0.21    0.00    4.65 ^ clkbuf_4_14_0_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
                  0.06    0.18    4.83 ^ clkbuf_4_14_0_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
     1    0.01                           clknet_4_14_0_user_clock2 (net)
                  0.06    0.00    4.83 ^ clkbuf_4_14_1_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
                  0.06    0.13    4.96 ^ clkbuf_4_14_1_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
     1    0.01                           clknet_4_14_1_user_clock2 (net)
                  0.06    0.00    4.96 ^ clkbuf_4_14_2_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
                  0.04    0.12    5.07 ^ clkbuf_4_14_2_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
     1    0.01                           clknet_4_14_2_user_clock2 (net)
                  0.04    0.00    5.07 ^ clkbuf_4_14_3_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
                  0.20    0.23    5.30 ^ clkbuf_4_14_3_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
     2    0.04                           clknet_4_14_3_user_clock2 (net)
                  0.20    0.00    5.31 ^ clkbuf_5_28_0_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
                  0.07    0.18    5.48 ^ clkbuf_5_28_0_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
     1    0.01                           clknet_5_28_0_user_clock2 (net)
                  0.07    0.00    5.48 ^ clkbuf_5_28_1_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
                  0.20    0.23    5.72 ^ clkbuf_5_28_1_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
     2    0.04                           clknet_5_28_1_user_clock2 (net)
                  0.20    0.00    5.72 ^ clkbuf_6_57_0_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
                  0.07    0.18    5.90 ^ clkbuf_6_57_0_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
     1    0.01                           clknet_6_57_0_user_clock2 (net)
                  0.07    0.00    5.90 ^ clkbuf_6_57_1_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
                  0.16    0.20    6.10 ^ clkbuf_6_57_1_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
     2    0.03                           clknet_6_57_1_user_clock2 (net)
                  0.16    0.00    6.10 ^ clkbuf_7_115_0_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
                  0.17    0.24    6.34 ^ clkbuf_7_115_0_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
     2    0.03                           clknet_7_115_0_user_clock2 (net)
                  0.17    0.00    6.34 ^ clkbuf_8_231_0_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
                  0.64    0.59    6.93 ^ clkbuf_8_231_0_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
     9    0.12                           clknet_8_231_0_user_clock2 (net)
                  0.64    0.00    6.93 ^ clkbuf_leaf_1036_user_clock2/A (sky130_fd_sc_hd__clkbuf_16)
                  0.05    0.27    7.20 ^ clkbuf_leaf_1036_user_clock2/X (sky130_fd_sc_hd__clkbuf_16)
     4    0.01                           clknet_leaf_1036_user_clock2 (net)
                  0.05    0.00    7.20 ^ _135620_/CLK (sky130_fd_sc_hd__dfxtp_1)
                  0.15    0.40    7.60 ^ _135620_/Q (sky130_fd_sc_hd__dfxtp_1)

If I change some of the defaults in Openlane, I can improve both the max delay and skew quite a lot:

#set ::env(CTS_SINK_CLUSTERING_MAX_DIAMETER) {50};
set ::env(CTS_SINK_CLUSTERING_MAX_DIAMETER) {200};
#set ::env(CTS_DISTANCE_BETWEEN_BUFFERS) {0};
set ::env(CTS_DISTANCE_BETWEEN_BUFFERS) {1000};
Clock user_clock2
Latency      CRPR       Skew
_132486_/CLK ^
   4.20
_138166_/CLK ^
   3.24     -0.15       0.81

4.2ns of max delay and 0.8ns of skew. What is interesting is we have appear to have reduced a number of the H tree layers quite a lot and yet the skew improved.

Fanout     Cap    Slew   Delay    Time   Description
-----------------------------------------------------------------------------
                          0.00    0.00   clock user_clock2 (rise edge)
                          0.00    0.00   clock source latency
                  0.61    0.47    0.47 ^ user_clock2 (in)
     1    0.13                           user_clock2 (net)
                  0.62    0.00    0.47 ^ repeater12/A (sky130_fd_sc_hd__buf_12)
                  0.28    0.36    0.83 ^ repeater12/X (sky130_fd_sc_hd__buf_12)
     1    0.33                           net621 (net)
                  0.63    0.29    1.12 ^ clkbuf_0_user_clock2/A (sky130_fd_sc_hd__clkbuf_16)
                  0.25    0.43    1.55 ^ clkbuf_0_user_clock2/X (sky130_fd_sc_hd__clkbuf_16)
     4    0.24                           clknet_0_user_clock2 (net)
                  0.26    0.04    1.59 ^ clkbuf_2_3_0_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
                  0.89    0.74    2.34 ^ clkbuf_2_3_0_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
     4    0.17                           clknet_2_3_0_user_clock2 (net)
                  0.90    0.08    2.42 ^ clkbuf_4_14_0_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
                  0.48    0.57    3.00 ^ clkbuf_4_14_0_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
     2    0.09                           clknet_4_14_0_user_clock2 (net)
                  0.48    0.01    3.00 ^ clkbuf_5_28__f_user_clock2/A (sky130_fd_sc_hd__clkbuf_16)
                  0.30    0.45    3.45 ^ clkbuf_5_28__f_user_clock2/X (sky130_fd_sc_hd__clkbuf_16)
    14    0.29                           clknet_5_28__leaf_user_clock2 (net)
                  0.30    0.00    3.46 ^ clkbuf_leaf_191_user_clock2/A (sky130_fd_sc_hd__clkbuf_16)
                  0.12    0.28    3.74 ^ clkbuf_leaf_191_user_clock2/X (sky130_fd_sc_hd__clkbuf_16)
    25    0.10                           clknet_leaf_191_user_clock2 (net)
                  0.12    0.00    3.74 ^ _131522_/CLK (sky130_fd_sc_hd__dfxtp_4)
                  0.07    0.43    4.17 v _131522_/Q (sky130_fd_sc_hd__dfxtp_4)

microwatt-cts.tar.gz

antonblanchard avatar Mar 23 '22 01:03 antonblanchard

Note the delays are not the min & max insertion delays, they are the min/max delays associated with the max skew.

maliberty avatar Mar 23 '22 18:03 maliberty

I can get a still better result by reversing the order of the buffers in

set ::env(CTS_CLK_BUFFER_LIST) {sky130_fd_sc_hd__clkbuf_8 sky130_fd_sc_hd__clkbuf_4 sky130_fd_sc_hd__clkbuf_2};

The largest buffer should be first (I have an open task for a new hire to make this insensitive to ordering). With that I get:

Clock user_clock2
Latency      CRPR       Skew
_136997_/CLK ^
   3.34
_135370_/CLK ^
   2.61     -0.15       0.58

maliberty avatar Mar 23 '22 19:03 maliberty

@sewkim @abk-openroad this is an autotuner opportunity

maliberty avatar Mar 23 '22 20:03 maliberty

https://github.com/RTimothyEdwards/open_pdks/issues/242

maliberty avatar Mar 23 '22 20:03 maliberty

@antonblanchard what are you hoping happens here with OR? These are OL overrides so I'm not sue what we can do about it in the software. I do think it is interesting to see if we can find better values through autotuning.

maliberty avatar Mar 23 '22 21:03 maliberty

@maliberty Thanks for the CTS_CLK_BUFFER_LIST tweak. My aim is to improve CTS via any means, so openlane tuning is good stuff.

As I've been removing buffers in the clock tree, I'm a little worried I'll end up with clock tree slew issues. It feels like an input to CTS should be a max slew (as well as max skew and perhaps max delay), but having said that, what is the upper bound on clock slew before bad things start happening (I presume false clock edges etc?).

I see the liberty files have 1.5ns for max transition on pretty much every pin (input/output of clkbuf, CLK input pin of FFs etc), so am I ok so long as I don't exceed this?

antonblanchard avatar Mar 24 '22 02:03 antonblanchard

Some conventional wisdom:

  • maxtran should never be more than 1/8 (ideally, 1/10) of the clock period

  • library maxtran is usually MUCH too large compared to realistic maxtran in real designs (e.g., with a too-large maxtran you lose too much crowbar current / internal power, and have too much vulnerability to crosstalk aggressors and consequent delay shifts)

  • at clock leaves especially, designers want to sharpen the transitions; there are often tighter maxtran constraints on FFs and clock buffers than on datapath instances

  • stay in the upper-left quarter of the Liberty table = first rule of thumb from designers

  • of course, too-tight maxtran constraints will result in oversizing / overdesign and harm design closure, so there is a balancing act here.

maliberty avatar Mar 24 '22 15:03 maliberty

In ORFS we call repair_clock_nets (https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts/blob/552bc27de3105d12b1d93755675cdb1445d08626/flow/scripts/cts.tcl#L44) but I don't see an equivalent in OL (@donn)

maliberty avatar Mar 24 '22 15:03 maliberty

@maliberty thanks for all the info, that helps a lot.

I see a call to repair_clock_nets in scripts/openroad/cts.tcl:

puts "\[INFO]: Repairing long wires on clock nets..."
# CTS leaves a long wire from the pad to the clock tree root.
repair_clock_nets -max_wire_length $::env(CTS_CLK_MAX_WIRE_LENGTH)

antonblanchard avatar Mar 27 '22 09:03 antonblanchard

repair_clock_nets is only to deal with the unbuffered wire from the pad/pin to the clock tree root. It doesn't do anything to the tree itself.

The skews in your reports are all reasonable so I don't see what you are worried about.

What exactly are you trying to improve? Insertion delay is a necessary evil that you can't remove. It can cause hold issues if the input arrivals do not account for it but that is just bad SDC.

If you look at your report you will see that most of the clock buffers have fanout 1 or 2, which combined with using a large buffer does not make much sense and will cause a large insertion delay. In your improved version the leaf clock buffer has a fanout 4, which is still not even close to CTS_SINK_CLUSTERING_SIZE.

If I use the ORFS sky130hd default values I see this: set ::env(CTS_CLK_BUFFER_LIST) {sky130_fd_sc_hd__clkbuf_4}; set ::env(CTS_ROOT_BUFFER) {sky130_fd_sc_hd__clkbuf_4}; set ::env(CTS_SINK_CLUSTERING_MAX_DIAMETER) {100}; set ::env(CTS_SINK_CLUSTERING_SIZE) {30}; set ::env(CTS_DISTANCE_BETWEEN_BUFFERS) {0}; Clock user_clock2 Latency CRPR Skew 140138/CLK ^ 6.910 131228/CLK ^ 5.683 -0.137 1.091

which still has a huge insertion delay. using this as a starting point and playing autotuner on CTS_DISTANCE_BETWEEN_BUFFERS gets set ::env(CTS_CLK_BUFFER_LIST) {sky130_fd_sc_hd__clkbuf_4}; set ::env(CTS_ROOT_BUFFER) {sky130_fd_sc_hd__clkbuf_4}; set ::env(CTS_SINK_CLUSTERING_MAX_DIAMETER) {100}; set ::env(CTS_SINK_CLUSTERING_SIZE) {30}; set ::env(CTS_DISTANCE_BETWEEN_BUFFERS) {600}; Clock user_clock2 Latency CRPR Skew 135560/CLK ^ 4.433 137299/CLK ^ 3.282 -0.139 1.012

which is pretty close to your result. I think the biggest issue here is the openlane's defaults.

jjcherry56 avatar Apr 04 '22 17:04 jjcherry56

As CTS_CLK_BUFFER_LIST been updated in open_pdks master. Hope the issue got fixed. Please re-open, if need further investigation.

vijayank88 avatar Jun 19 '23 10:06 vijayank88