CTS post CTS optimization makes things worse
I've been trying to reduce clock tree max delay and skew in some sky130hd designs. One thing I have noticed is that the post CTS optimization almost always makes things worse. As an example:
Enabled:
Clock user_clock2
Latency CRPR Skew
_135639_/CLK ^
9.87
_129357_/CLK ^
7.55 -0.28 2.04
Disabled:
Clock user_clock2
Latency CRPR Skew
_132397_/CLK ^
8.27
_130828_/CLK ^
5.95 -0.36 1.95
These are post DRT numbers, just to be sure the difference was real. I experimented with making the threshold for fixing larger (currently set at 5x), but that didn't seem to help. Looking at the modification it made to the one of the longest paths, I see a bunch of clkbuf_16 instances added:
+ sky130_fd_sc_hd__clkbuf_16 clkbuf_opt_1_0_user_clock2 (.A(clknet_6_29__leaf_user_clock2),
+ .X(clknet_opt_1_0_user_clock2));
+ sky130_fd_sc_hd__clkbuf_16 clkbuf_opt_1_1_user_clock2 (.A(clknet_opt_1_0_user_clock2),
+ .X(clknet_opt_1_1_user_clock2));
+ sky130_fd_sc_hd__clkbuf_16 clkbuf_opt_1_2_user_clock2 (.A(clknet_opt_1_1_user_clock2),
+ .X(clknet_opt_1_2_user_clock2));
+ sky130_fd_sc_hd__clkbuf_16 clkbuf_opt_1_3_user_clock2 (.A(clknet_opt_1_2_user_clock2),
+ .X(clknet_opt_1_3_user_clock2));
+ sky130_fd_sc_hd__clkbuf_16 clkbuf_opt_1_4_user_clock2 (.A(clknet_opt_1_3_user_clock2),
+ .X(clknet_opt_1_4_user_clock2));
+ sky130_fd_sc_hd__clkbuf_16 clkbuf_opt_1_5_user_clock2 (.A(clknet_opt_1_4_user_clock2),
+ .X(clknet_opt_1_5_user_clock2));
+ sky130_fd_sc_hd__clkbuf_16 clkbuf_opt_1_6_user_clock2 (.A(clknet_opt_1_5_user_clock2),
+ .X(clknet_opt_1_6_user_clock2));
+ sky130_fd_sc_hd__clkbuf_16 clkbuf_opt_1_7_user_clock2 (.A(clknet_opt_1_6_user_clock2),
+ .X(clknet_opt_1_7_user_clock2));
+ sky130_fd_sc_hd__clkbuf_16 clkbuf_opt_1_8_user_clock2 (.A(clknet_opt_1_7_user_clock2),
+ .X(clknet_opt_1_8_user_clock2));
+ sky130_fd_sc_hd__clkbuf_16 clkbuf_opt_1_9_user_clock2 (.A(clknet_opt_1_8_user_clock2),
+ .X(clknet_opt_1_9_user_clock2));
That's a lot of buffers, which are all going to have a decent amount of delay. Could this optimization be better suited to more advanced nodes than 130nm where the gate delay vs wire delay trade off is different? Should we just disable it on sky130?
I can package up this test case if it helps.
Its something of a leftover from before we had repair_clock_nets and it is pretty simplistic. If it works better without I would just turn it off.
Wouldn't it be better to auto detect a worse result, and just no-op?
This was removed in 1048d6658a6788b0863bf8d43b58f3e197bb3416