Excessive buffering of high-fanout net driven by IO cell in resizer.tcl
Describe the bug
I have a testcase design comprising a few I/O cells and 501 flop instances. The SE pins of the flops are directly hooked-up to the internal output pin of an I/O cell. Basically, the testcase uses the embedded MUX of the MuxD flip flops to add a reset to the flops. (This testcase was just created to re-produce the buffering issue found using a different design.)
module testcase (
[...]
input rst,
[...]
parameter FLOPS = 500;
[...]
wire rst_int;
[...]
(* keep *) gf180mcu_fd_io__in_s pad_rst(
.PU(1'b0), .PD(1'b0),
.PAD(rst),
.Y(rst_int)
);
[...]
wire [FLOPS-1:0] flop_q;
gf180mcu_fd_sc_mcu7t5v0__sdffq_1 flops [FLOPS-1:0] (
.CLK({FLOPS{clk_int}}),
.SE({FLOPS{rst_int}}),
.SI({FLOPS{1'b0}}),
.D({flop_q[FLOPS-2:0], din_int}),
.Q(flop_q),
);
[...]
During execution of resizer.tcl, the placement utilization increases from 15% before repair_design to 49% after repair_design, which is an area increase of +237%. This issue is only triggered, when the net is driven by the IO cell. If the net is a primary IO, the buffering does not happen.
Design area 75230 u^2 15% utilization.
[INFO RSZ-0058] Using max wire length 6781um.
[INFO RSZ-0034] Found 1 slew violations.
[INFO RSZ-0035] Found 1 fanout violations.
[INFO RSZ-0036] Found 1 capacitance violations.
[INFO RSZ-0038] Inserted 5087 buffers in 1 nets.
[INFO RSZ-0039] Resized 5083 instances.
Design area 254327 u^2 49% utilization.
If a buffer is inserted between the IO cell and the high-fanout net, the excessive buffering does not happen.
For that purpose, this code was added to resizer.tcl within the test case.
(Adding the buffer inside the RTL would have also worked.)
make_net rst_int_buffered
make_inst rst_int_buf gf180mcu_fd_sc_mcu7t5v0__buf_4
foreach pin [get_pins -of_objects [get_nets rst_int] -filter "direction == input"] {
disconnect_pin rst_int $pin
connect_pin rst_int_buffered $pin
}
connect_pin rst_int_buffered rst_int_buf/Z
connect_pin rst_int rst_int_buf/I
# place new cell rst_inst_buf
global_placement -incremental
Now, only moderate buffering occurs, added buffer count dropped from 5087 down to 68, the placement utilization remains at 15%.
Design area 75260 u^2 15% utilization.
[INFO RSZ-0058] Using max wire length 6781um.
[INFO RSZ-0035] Found 1 fanout violations.
[INFO RSZ-0038] Inserted 68 buffers in 1 nets.
[INFO RSZ-0039] Resized 70 instances.
Design area 76648 u^2 15% utilization.
I have additionally attached an ipython notebook for Google Colab to reproduce the issue.
/cc @proppy /cc @dhaentz1
Expected Behavior
Only moderate buffering should occur when buffering high-fanout nets. It should not consume a multiple of the area the actual design has before buffering.
OpenROAD Environment
Env.sh seems not to be included in my litex-hub package - please use the testcase notebook to extract data.
OpenLane Environment
env.py seems to be broken in my setup - please use the testcase notebook to extract data.
To Reproduce
OpenRoad__excessive_buffering_on_high_fanout_net_when_resizing.ipynb.gz _build.excessive_buffering_in_resizer.zip
Relevant log output
Design area 75230 u^2 15% utilization.
[INFO RSZ-0058] Using max wire length 6781um.
[INFO RSZ-0034] Found 1 slew violations.
[INFO RSZ-0035] Found 1 fanout violations.
[INFO RSZ-0036] Found 1 capacitance violations.
[INFO RSZ-0038] Inserted 5087 buffers in 1 nets.
[INFO RSZ-0039] Resized 5083 instances.
Design area 254327 u^2 49% utilization.
Screenshots
No response
Additional Context
No response
This is because the default_max_fanout value for the io cells is set to 1 in the liberty files. Changing it a more reasonable number (10) will fix the issue. Adding a buffer in the middle (which has a reasonable value for default_max_fanout) also achieves the same goal.
I disagree with the solution (though I agree a max-fanout of 1 is probably inappropriate for an IO cell). The solution implies OR cannot fix buffer trees if the driver cell has a max-fanout of 1 which may be reasonable for other cells. The correct solution would be OR to add a single buffer to fulfill the max-fanout-1 requirement, and then add buffers after that buffer.
Please re-open the issue.
I don't think the buffering code is going to have this kind of human level intelligence in the near future. If a driver cell has a max fanout of 1 for other cells... one hopes that this driver cell is not connected to 500 other cells because then the resizer code is stuck fixing bugs/problems created by some other module (and the problem should be fixed at the source). We can keep this issue open if you like.
@openroadie it seems simple enough to fix the fanout violation. Is there a reason rsz can't insert one buffer which can drive a larger fanout (or subsequent buffers). It doesn't seem that different from fixing a max cap or slew violation where a buffer tree is needed.