OpenROAD icon indicating copy to clipboard operation
OpenROAD copied to clipboard

Global place with routability drops solution if it fails to perform routability

Open gadfort opened this issue 9 months ago • 25 comments

Describe the bug

When running global place with routability it eventually fails with:

[INFO GPL-0087] FinalRC: 1.0801135
[INFO GPL-0079] MinRC (1.079737) violation occurred, total count: 3.
[INFO GPL-0045] InflatedAreaDelta:      411.943 um^2 (+0.25%)
[INFO GPL-0046] TargetDensity:            0.189
Revert Routability Procedure. Target density higher than max, or minRC max violations.
[INFO GPL-0080] minRcViolatedCnt: 3
[INFO GPL-0047] SavedMinRC: 1.0797
[INFO GPL-0048] SavedTargetDensity: 0.1875
[INFO GPL-0089] Routability: revert back to snapshot

and continues to run global place but without any of the interim solutions for rouability eventually resulting in a design that cannot be routed

Expected Behavior

Either:

  1. continues to attempt routabilty until it works
  2. preserve the last solution even if its not perfect since it's probably better than nothing

Environment

OpenROAD v2.0-19622-g9e747a8bc 
Features included (+) or not (-): +GPU +GUI +Python

To Reproduce

sc_issue_darkriscv_job0_ihp130_sg13g2_stdcell_place.global0_20250325-134902.tar.gz

tar xvf sc_issue_darkriscv_job0_ihp130_sg13g2_stdcell_place.global0_20250325-134902.tar.gz
cd sc_issue_darkriscv_job0_ihp130_sg13g2_stdcell_place
./run.sh

Relevant log output

[NesterovSolve] Iter:  860 overflow: 0.329 HPWL: 1094242171
[INFO GPL-0075] Routability iteration: 8
[INFO GPL-0036] TileBBox: (    0    0 ) ( 7200 7200 ) DBU
[INFO GPL-0038] TileCnt:     180  180
[INFO GPL-0040] NumTiles: 32400
[INFO GPL-0081] TotalRouteOverflow: 63.8768
[INFO GPL-0082] OverflowTileCnt: 4306
[INFO GPL-0083] 0.5%RC: 1.0888
[INFO GPL-0084] 1.0%RC: 1.0714
[INFO GPL-0085] 2.0%RC: 1.0552
[INFO GPL-0086] 5.0%RC: 1.0336
[INFO GPL-0087] FinalRC: 1.0801135
[INFO GPL-0079] MinRC (1.079737) violation occurred, total count: 3.
[INFO GPL-0045] InflatedAreaDelta:      411.943 um^2 (+0.25%)
[INFO GPL-0046] TargetDensity:            0.189
Revert Routability Procedure. Target density higher than max, or minRC max violations.
[INFO GPL-0080] minRcViolatedCnt: 3
[INFO GPL-0047] SavedMinRC: 1.0797
[INFO GPL-0048] SavedTargetDensity: 0.1875
[INFO GPL-0089] Routability: revert back to snapshot
[NesterovSolve] Iter:  870 overflow: 0.587 HPWL: 835816654
[NesterovSolve] Iter:  880 overflow: 0.543 HPWL: 881393381
[NesterovSolve] Iter:  890 overflow: 0.493 HPWL: 933310243
[NesterovSolve] Iter:  900 overflow: 0.445 HPWL: 995779753
[NesterovSolve] Iter:  910 overflow: 0.396 HPWL: 1037078844
[NesterovSolve] Iter:  920 overflow: 0.351 HPWL: 1068994465
[NesterovSolve] Iter:  930 overflow: 0.311 HPWL: 1092082480
[NesterovSolve] Iter:  940 overflow: 0.277 HPWL: 1105598257

Screenshots

No response

Additional Context

No response

gadfort avatar Mar 25 '25 17:03 gadfort

@gudeh I'd like to know why routability isn't making any progress. There appears to be room for spreading the cells out.

When this fails we should distinguish the max density case, which does require a revert, from the no progress case which doesn't need one.

maliberty avatar Mar 25 '25 18:03 maliberty

The routability iterations failed to reduce the routing congestion (RC) below the minRC threshold three times. As a result, the process gave up and reverted to the saved snapshot. However, it still uses the inflation values that led to the minRC iteration, so, in theory, it reverts to the state that achieved the best routing congestion observed.

In other words, theoretically, it should follow the same path that led to the minRC. But I’ve never actually checked whether this happens in practice.

gudeh avatar Mar 25 '25 18:03 gudeh

@gudeh I'd like to know why routability isn't making any progress. There appears to be room for spreading the cells out.

When this fails we should distinguish the max density case, which does require a revert, from the no progress case which doesn't need one.

Do you know how to use debug mode with @gadfort SC artifacts?

gudeh avatar Mar 25 '25 18:03 gudeh

@gudeh I assume you will need to modify the script to allow you to do some debugging since there isn't anything built in, look for this file: sc_global_placement.tcl

gadfort avatar Mar 25 '25 19:03 gadfort

For debug mode with SC artifact: For adding -gui: sc_issue_darkriscv_job0_ihp130_sg13g2_stdcell_place/build/darkriscv/job0_ihp130_sg13g2_stdcell/place.global/0/replay.sh For adding global_placement_debug: sc_issue_darkriscv_job0_ihp130_sg13g2_stdcell_place/build/darkriscv/job0_ihp130_sg13g2_stdcell/sc_collected_files/scripts_47796b273299036fb7ff7ec933bb5669cfd57b7c/common/procs.tcl

gudeh avatar Mar 27 '25 16:03 gudeh

I tried increasing the inflation parameters with different values, but we do not seem to improve routing congestion even inflating more.

The following image is from a run where I did a high increase in inflation:

Image

Although the routing congestion is really similar to using default values. The routing congestion calculated during routability is quite similar and the RUDY heatmaps also look similar.

Image

here is the default for comparison:

Image

gudeh avatar Apr 16 '25 15:04 gudeh

In summary I understand we do not reduce RUDY congestion even with cells being more spread out.

gudeh avatar Apr 16 '25 18:04 gudeh

In the "high increase in inflation" the cells still don't look maximal spread out. When we inflate do we decrease the space filling cells?

maliberty avatar Apr 16 '25 18:04 maliberty

ah, well remembered. We do not remove filler gcells to compensate for the extra area, instead we increase the density. Maybe this is one more reason to focus on removal of instances. It could allow for removal of fillers.

gudeh avatar Apr 16 '25 19:04 gudeh

The run with increase inflation we go from 0.1935 to 0.2383 due to all inflation. And to 0.261 due to timing-driven non-virtual iterations afterwards.

gudeh avatar Apr 16 '25 19:04 gudeh

Increasing the density moves in the opposite direction to inflation so that makes it less likely to work. What are the issues to instance removal?

maliberty avatar Apr 16 '25 20:04 maliberty

Increasing the density moves in the opposite direction to inflation so that makes it less likely to work. What are the issues to instance removal?

Fillers are stored in vectors, a similar situation than when we inserted cells foe non virtual TD iterations.

I have a branch able to listen to odb callbacks when rsz uses it's function to remove all buffers. It's working without crashing. I just need to test it better. And we should be able to have the new rebuffering from Martin working during gpl. After this is settled I would be able to use the feature to also remove gcells fillers during routability.

If we try to just remove the fillers currently the code crashes because of pointer invalidation with the vectors

gudeh avatar Apr 16 '25 21:04 gudeh

In this branch I implemented the removal of fillers and used it in routability instead of increasing density. I would like to test this with the design from this issue, although I am getting this error which seems to be because of a newer OR version: Error: sc_global_placement.tcl, 5 bad option "./sc_manifest.tcl": must be -encoding

It would be interesting to see the results with this design.

So far I checked with sky130hd/aes, and it spreads more, as expected:

Increasing density (current default): Image

Removing fillers: Image

gudeh avatar Apr 30 '25 17:04 gudeh

In this branch I implemented the removal of fillers and used it in routability instead of increasing density. I would like to test this with the design from this issue, although I am getting this error which seems to be because of a newer OR version: Error: sc_global_placement.tcl, 5 bad option "./sc_manifest.tcl": must be -encoding

It would be interesting to see the results with this design.

So far I checked with sky130hd/aes, and it spreads more, as expected:

Increasing density (current default): Image

Removing fillers: Image

The error is because of the switch in OpenSTA. You can just search for "source -echo" and replace with "source" in the tcl files. That should get you around this. It looks like the spreading is making a good difference.

gadfort avatar Apr 30 '25 17:04 gadfort

@eder-matheus

gudeh avatar Apr 30 '25 19:04 gudeh

You can also replace source with import if you want to keep the -echo.

maliberty avatar Apr 30 '25 22:04 maliberty

I tried both ways, replacing all occurrences of "source -echo" for "source", and replacing "source" for "import" in all TCL files and both attempts resulted in an error, unfortunately.

gudeh avatar May 01 '25 10:05 gudeh

I forgot a redirect was there too. This should get you back:

new.tar.gz

gadfort avatar May 01 '25 12:05 gadfort

Thanks @gadfort, now it works.

Default: Image

Removing fillers with default inflation: Image

Removing fillers with increased inflation: Image

gudeh avatar May 01 '25 14:05 gudeh

@gudeh great, the one with greater inflation looks better

gadfort avatar May 01 '25 14:05 gadfort

This design specifically seems to require more inflation even with the new removing filler feature. We could think of a way to dynamically modify the inflation values.

gudeh avatar May 01 '25 16:05 gudeh

Are you still applying cell bloating along with filler removal?

maliberty avatar May 01 '25 22:05 maliberty

yes! I did not touch that part. I only replaced the part where we modify the target density for a function that reduces the number of fillers.

gudeh avatar May 02 '25 17:05 gudeh

@gudeh when I tried it it's still quitting after 3 non-improving iterations, which seems like something is getting stuck (maybe?) since there is plenty of room to continue expanding.

gadfort avatar May 02 '25 17:05 gadfort

Yes, we still observe the three consecutive iterations without improvement in routability. I noticed that for this design, we see low area inflation values (around 1%). Increasing the inflation parameters leads to more area inflation and slightly lesser congestion, but the differences are small.

I would need to double-check, but I have the impression that in other designs, we also don’t see much area inflation (around 1% per iteration also), yet congestion improves more noticeably. It might be that in this case, congestion is too centralized over a large area, making it harder for inflation to improve routability, but I would have to understand why.

Also, currently, we only display the delta inflation between consecutive iterations. I have a branch where I added the final total inflation, I should open a PR for this.

gudeh avatar May 02 '25 17:05 gudeh