CTS produces clock trees with large max delay and large skew
I've been struggling with timing issues when taping out Microwatt on sky130. Part of the issue appears to be CTS which is producing clock trees with large max delay and large skew. An example is attached:
openroad run.tcl
Looking at the stats for user_clock2:
Clock user_clock2
Latency CRPR Skew
_138944_/CLK ^
8.01
_132706_/CLK ^
6.10 -0.13 1.78
8ns of max delay and 1.8ns of skew. If you look one of the paths through the clock tree, there are a whole lot of buffers:
Fanout Cap Slew Delay Time Description
-----------------------------------------------------------------------------
0.00 0.00 clock user_clock2 (rise edge)
0.00 0.00 clock source latency
0.61 0.48 0.48 ^ user_clock2 (in)
1 0.14 user_clock2 (net)
0.62 0.00 0.48 ^ repeater12/A (sky130_fd_sc_hd__buf_12)
0.29 0.36 0.84 ^ repeater12/X (sky130_fd_sc_hd__buf_12)
1 0.33 net621 (net)
0.63 0.29 1.13 ^ clkbuf_0_user_clock2/A (sky130_fd_sc_hd__clkbuf_16)
0.06 0.28 1.40 ^ clkbuf_0_user_clock2/X (sky130_fd_sc_hd__clkbuf_16)
2 0.02 clknet_0_user_clock2 (net)
0.06 0.00 1.40 ^ clkbuf_1_1_0_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
0.06 0.13 1.53 ^ clkbuf_1_1_0_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
1 0.01 clknet_1_1_0_user_clock2 (net)
0.06 0.00 1.53 ^ clkbuf_1_1_1_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
0.06 0.13 1.66 ^ clkbuf_1_1_1_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
1 0.01 clknet_1_1_1_user_clock2 (net)
0.06 0.00 1.66 ^ clkbuf_1_1_2_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
0.06 0.13 1.79 ^ clkbuf_1_1_2_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
1 0.01 clknet_1_1_2_user_clock2 (net)
0.06 0.00 1.79 ^ clkbuf_1_1_3_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
0.06 0.13 1.92 ^ clkbuf_1_1_3_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
1 0.01 clknet_1_1_3_user_clock2 (net)
0.06 0.00 1.92 ^ clkbuf_1_1_4_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
0.06 0.13 2.06 ^ clkbuf_1_1_4_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
1 0.01 clknet_1_1_4_user_clock2 (net)
0.06 0.00 2.06 ^ clkbuf_1_1_5_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
0.06 0.13 2.19 ^ clkbuf_1_1_5_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
1 0.01 clknet_1_1_5_user_clock2 (net)
0.06 0.00 2.19 ^ clkbuf_1_1_6_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
0.06 0.13 2.32 ^ clkbuf_1_1_6_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
1 0.01 clknet_1_1_6_user_clock2 (net)
0.06 0.00 2.32 ^ clkbuf_1_1_7_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
0.42 0.38 2.71 ^ clkbuf_1_1_7_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
2 0.08 clknet_1_1_7_user_clock2 (net)
0.43 0.02 2.73 ^ clkbuf_2_3_0_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
0.07 0.22 2.94 ^ clkbuf_2_3_0_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
1 0.01 clknet_2_3_0_user_clock2 (net)
0.07 0.00 2.95 ^ clkbuf_2_3_1_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
0.06 0.13 3.08 ^ clkbuf_2_3_1_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
1 0.01 clknet_2_3_1_user_clock2 (net)
0.06 0.00 3.08 ^ clkbuf_2_3_2_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
0.06 0.13 3.21 ^ clkbuf_2_3_2_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
1 0.01 clknet_2_3_2_user_clock2 (net)
0.06 0.00 3.21 ^ clkbuf_2_3_3_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
0.06 0.13 3.34 ^ clkbuf_2_3_3_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
1 0.01 clknet_2_3_3_user_clock2 (net)
0.06 0.00 3.34 ^ clkbuf_2_3_4_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
0.06 0.13 3.47 ^ clkbuf_2_3_4_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
1 0.01 clknet_2_3_4_user_clock2 (net)
0.06 0.00 3.47 ^ clkbuf_2_3_5_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
0.06 0.13 3.60 ^ clkbuf_2_3_5_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
1 0.01 clknet_2_3_5_user_clock2 (net)
0.06 0.00 3.60 ^ clkbuf_2_3_6_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
0.33 0.32 3.92 ^ clkbuf_2_3_6_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
2 0.06 clknet_2_3_6_user_clock2 (net)
0.33 0.01 3.94 ^ clkbuf_3_7_0_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
0.07 0.21 4.14 ^ clkbuf_3_7_0_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
1 0.01 clknet_3_7_0_user_clock2 (net)
0.07 0.00 4.14 ^ clkbuf_3_7_1_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
0.07 0.14 4.28 ^ clkbuf_3_7_1_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
1 0.01 clknet_3_7_1_user_clock2 (net)
0.07 0.00 4.28 ^ clkbuf_3_7_2_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
0.06 0.13 4.41 ^ clkbuf_3_7_2_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
1 0.01 clknet_3_7_2_user_clock2 (net)
0.06 0.00 4.41 ^ clkbuf_3_7_3_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
0.21 0.24 4.65 ^ clkbuf_3_7_3_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
2 0.04 clknet_3_7_3_user_clock2 (net)
0.21 0.00 4.65 ^ clkbuf_4_14_0_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
0.06 0.18 4.83 ^ clkbuf_4_14_0_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
1 0.01 clknet_4_14_0_user_clock2 (net)
0.06 0.00 4.83 ^ clkbuf_4_14_1_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
0.06 0.13 4.96 ^ clkbuf_4_14_1_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
1 0.01 clknet_4_14_1_user_clock2 (net)
0.06 0.00 4.96 ^ clkbuf_4_14_2_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
0.04 0.12 5.07 ^ clkbuf_4_14_2_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
1 0.01 clknet_4_14_2_user_clock2 (net)
0.04 0.00 5.07 ^ clkbuf_4_14_3_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
0.20 0.23 5.30 ^ clkbuf_4_14_3_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
2 0.04 clknet_4_14_3_user_clock2 (net)
0.20 0.00 5.31 ^ clkbuf_5_28_0_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
0.07 0.18 5.48 ^ clkbuf_5_28_0_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
1 0.01 clknet_5_28_0_user_clock2 (net)
0.07 0.00 5.48 ^ clkbuf_5_28_1_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
0.20 0.23 5.72 ^ clkbuf_5_28_1_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
2 0.04 clknet_5_28_1_user_clock2 (net)
0.20 0.00 5.72 ^ clkbuf_6_57_0_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
0.07 0.18 5.90 ^ clkbuf_6_57_0_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
1 0.01 clknet_6_57_0_user_clock2 (net)
0.07 0.00 5.90 ^ clkbuf_6_57_1_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
0.16 0.20 6.10 ^ clkbuf_6_57_1_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
2 0.03 clknet_6_57_1_user_clock2 (net)
0.16 0.00 6.10 ^ clkbuf_7_115_0_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
0.17 0.24 6.34 ^ clkbuf_7_115_0_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
2 0.03 clknet_7_115_0_user_clock2 (net)
0.17 0.00 6.34 ^ clkbuf_8_231_0_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
0.64 0.59 6.93 ^ clkbuf_8_231_0_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
9 0.12 clknet_8_231_0_user_clock2 (net)
0.64 0.00 6.93 ^ clkbuf_leaf_1036_user_clock2/A (sky130_fd_sc_hd__clkbuf_16)
0.05 0.27 7.20 ^ clkbuf_leaf_1036_user_clock2/X (sky130_fd_sc_hd__clkbuf_16)
4 0.01 clknet_leaf_1036_user_clock2 (net)
0.05 0.00 7.20 ^ _135620_/CLK (sky130_fd_sc_hd__dfxtp_1)
0.15 0.40 7.60 ^ _135620_/Q (sky130_fd_sc_hd__dfxtp_1)
If I change some of the defaults in Openlane, I can improve both the max delay and skew quite a lot:
#set ::env(CTS_SINK_CLUSTERING_MAX_DIAMETER) {50};
set ::env(CTS_SINK_CLUSTERING_MAX_DIAMETER) {200};
#set ::env(CTS_DISTANCE_BETWEEN_BUFFERS) {0};
set ::env(CTS_DISTANCE_BETWEEN_BUFFERS) {1000};
Clock user_clock2
Latency CRPR Skew
_132486_/CLK ^
4.20
_138166_/CLK ^
3.24 -0.15 0.81
4.2ns of max delay and 0.8ns of skew. What is interesting is we have appear to have reduced a number of the H tree layers quite a lot and yet the skew improved.
Fanout Cap Slew Delay Time Description
-----------------------------------------------------------------------------
0.00 0.00 clock user_clock2 (rise edge)
0.00 0.00 clock source latency
0.61 0.47 0.47 ^ user_clock2 (in)
1 0.13 user_clock2 (net)
0.62 0.00 0.47 ^ repeater12/A (sky130_fd_sc_hd__buf_12)
0.28 0.36 0.83 ^ repeater12/X (sky130_fd_sc_hd__buf_12)
1 0.33 net621 (net)
0.63 0.29 1.12 ^ clkbuf_0_user_clock2/A (sky130_fd_sc_hd__clkbuf_16)
0.25 0.43 1.55 ^ clkbuf_0_user_clock2/X (sky130_fd_sc_hd__clkbuf_16)
4 0.24 clknet_0_user_clock2 (net)
0.26 0.04 1.59 ^ clkbuf_2_3_0_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
0.89 0.74 2.34 ^ clkbuf_2_3_0_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
4 0.17 clknet_2_3_0_user_clock2 (net)
0.90 0.08 2.42 ^ clkbuf_4_14_0_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
0.48 0.57 3.00 ^ clkbuf_4_14_0_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
2 0.09 clknet_4_14_0_user_clock2 (net)
0.48 0.01 3.00 ^ clkbuf_5_28__f_user_clock2/A (sky130_fd_sc_hd__clkbuf_16)
0.30 0.45 3.45 ^ clkbuf_5_28__f_user_clock2/X (sky130_fd_sc_hd__clkbuf_16)
14 0.29 clknet_5_28__leaf_user_clock2 (net)
0.30 0.00 3.46 ^ clkbuf_leaf_191_user_clock2/A (sky130_fd_sc_hd__clkbuf_16)
0.12 0.28 3.74 ^ clkbuf_leaf_191_user_clock2/X (sky130_fd_sc_hd__clkbuf_16)
25 0.10 clknet_leaf_191_user_clock2 (net)
0.12 0.00 3.74 ^ _131522_/CLK (sky130_fd_sc_hd__dfxtp_4)
0.07 0.43 4.17 v _131522_/Q (sky130_fd_sc_hd__dfxtp_4)
Note the delays are not the min & max insertion delays, they are the min/max delays associated with the max skew.
I can get a still better result by reversing the order of the buffers in
set ::env(CTS_CLK_BUFFER_LIST) {sky130_fd_sc_hd__clkbuf_8 sky130_fd_sc_hd__clkbuf_4 sky130_fd_sc_hd__clkbuf_2};
The largest buffer should be first (I have an open task for a new hire to make this insensitive to ordering). With that I get:
Clock user_clock2
Latency CRPR Skew
_136997_/CLK ^
3.34
_135370_/CLK ^
2.61 -0.15 0.58
@sewkim @abk-openroad this is an autotuner opportunity
https://github.com/RTimothyEdwards/open_pdks/issues/242
@antonblanchard what are you hoping happens here with OR? These are OL overrides so I'm not sue what we can do about it in the software. I do think it is interesting to see if we can find better values through autotuning.
@maliberty Thanks for the CTS_CLK_BUFFER_LIST tweak. My aim is to improve CTS via any means, so openlane tuning is good stuff.
As I've been removing buffers in the clock tree, I'm a little worried I'll end up with clock tree slew issues. It feels like an input to CTS should be a max slew (as well as max skew and perhaps max delay), but having said that, what is the upper bound on clock slew before bad things start happening (I presume false clock edges etc?).
I see the liberty files have 1.5ns for max transition on pretty much every pin (input/output of clkbuf, CLK input pin of FFs etc), so am I ok so long as I don't exceed this?
Some conventional wisdom:
-
maxtran should never be more than 1/8 (ideally, 1/10) of the clock period
-
library maxtran is usually MUCH too large compared to realistic maxtran in real designs (e.g., with a too-large maxtran you lose too much crowbar current / internal power, and have too much vulnerability to crosstalk aggressors and consequent delay shifts)
-
at clock leaves especially, designers want to sharpen the transitions; there are often tighter maxtran constraints on FFs and clock buffers than on datapath instances
-
stay in the upper-left quarter of the Liberty table = first rule of thumb from designers
-
of course, too-tight maxtran constraints will result in oversizing / overdesign and harm design closure, so there is a balancing act here.
In ORFS we call repair_clock_nets (https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts/blob/552bc27de3105d12b1d93755675cdb1445d08626/flow/scripts/cts.tcl#L44) but I don't see an equivalent in OL (@donn)
@maliberty thanks for all the info, that helps a lot.
I see a call to repair_clock_nets in scripts/openroad/cts.tcl:
puts "\[INFO]: Repairing long wires on clock nets..."
# CTS leaves a long wire from the pad to the clock tree root.
repair_clock_nets -max_wire_length $::env(CTS_CLK_MAX_WIRE_LENGTH)
repair_clock_nets is only to deal with the unbuffered wire from the pad/pin to the clock tree root. It doesn't do anything to the tree itself.
The skews in your reports are all reasonable so I don't see what you are worried about.
What exactly are you trying to improve? Insertion delay is a necessary evil that you can't remove. It can cause hold issues if the input arrivals do not account for it but that is just bad SDC.
If you look at your report you will see that most of the clock buffers have fanout 1 or 2, which combined with using a large buffer does not make much sense and will cause a large insertion delay. In your improved version the leaf clock buffer has a fanout 4, which is still not even close to CTS_SINK_CLUSTERING_SIZE.
If I use the ORFS sky130hd default values I see this: set ::env(CTS_CLK_BUFFER_LIST) {sky130_fd_sc_hd__clkbuf_4}; set ::env(CTS_ROOT_BUFFER) {sky130_fd_sc_hd__clkbuf_4}; set ::env(CTS_SINK_CLUSTERING_MAX_DIAMETER) {100}; set ::env(CTS_SINK_CLUSTERING_SIZE) {30}; set ::env(CTS_DISTANCE_BETWEEN_BUFFERS) {0}; Clock user_clock2 Latency CRPR Skew 140138/CLK ^ 6.910 131228/CLK ^ 5.683 -0.137 1.091
which still has a huge insertion delay. using this as a starting point and playing autotuner on CTS_DISTANCE_BETWEEN_BUFFERS gets set ::env(CTS_CLK_BUFFER_LIST) {sky130_fd_sc_hd__clkbuf_4}; set ::env(CTS_ROOT_BUFFER) {sky130_fd_sc_hd__clkbuf_4}; set ::env(CTS_SINK_CLUSTERING_MAX_DIAMETER) {100}; set ::env(CTS_SINK_CLUSTERING_SIZE) {30}; set ::env(CTS_DISTANCE_BETWEEN_BUFFERS) {600}; Clock user_clock2 Latency CRPR Skew 135560/CLK ^ 4.433 137299/CLK ^ 3.282 -0.139 1.012
which is pretty close to your result. I think the biggest issue here is the openlane's defaults.
As CTS_CLK_BUFFER_LIST been updated in open_pdks master. Hope the issue got fixed. Please re-open, if need further investigation.