vtr-verilog-to-routing icon indicating copy to clipboard operation
vtr-verilog-to-routing copied to clipboard

Cluster attraction groups for tight floorplan constraints not working

Open vaughnbetz opened this issue 2 years ago • 1 comments

The merge of Mohamed Elgammal's new clustering api and refactoring somehow broke cluster attraction groups, which are necessary for tight floorplan region constraints. They should be fixed.

Expected Behaviour

Attraction groups should force a dense packing on parts of a design with tight floorplan constraints.

Current Behaviour

Instead, sfkhalid tells me that they currently don't work (in fact area seems to increase in later clustering attempts) so something is broken.

Possible Solution

These did work, so presumably something was broken in the refactoring.

Steps to Reproduce

I believe the failures happen on the tests Sarah has on her own branch, but which aren't yet on master.

Context

Need this fixed before recommending people use the floorplan constraints. This is also holding up putting a full set of tests for floorplanning into CI, which is dangerous as more things could break without anyone knowing.

Your Environment

VTR master in late May, Linux

vaughnbetz avatar Jul 08 '22 00:07 vaughnbetz

@vaughnbetz @sfkhalid I've been looking into using fp constraints and thought it would be worth sharing some things I've come across.

This PR #1938 is where things break for a very tightly constrained designed I've been working on. Before master containing Mohamed's first clustering API PR was merged into sfkhalid's branch with this commit (I think): a806ffbc384ee4bbf9f3d2b9580bf67fa6a15f9b, this assertion fails: File: vtr-verilog-to-routing/vpr/src/pack/pack.cpp Line:200 Message: Failed to find device which satisifies resource requirements (bug since my target arch has more than enough primitives for the design)

Mohamed's first clustering API PR #1954 was not merged in at that point in the branch I think based on the commit messages . After sfkhalid changed the packflow in this commit: 968e82cc86848334a801f89606ec5b55d52ce353 , the thrown assertion changes to the one happening on master right now. Behaviour is that the primitives are too tightly packed for a "manual" placement done using placement_constraints.xml though the exact pattern of the invalid packing for mycase are different between the above commits and the one happening on master.

Maybe this bug somehow slipped through the cracks before the API changes were done and were introduced by the code in the working branch of #1938 ?

ArashAhmadian avatar Aug 02 '22 20:08 ArashAhmadian

@zhaisitong: can you take a look at this one? We can discuss Thursday.

vaughnbetz avatar Sep 06 '22 18:09 vaughnbetz

@sfkhalid : any updates?

vaughnbetz avatar Sep 06 '22 18:09 vaughnbetz

@zhaisitong: can you take a look at this one? We can discuss Thursday.

How could I run to the failed place? What is the exact command or what regressions should I run to get this bug?

zhaisitong avatar Sep 06 '22 19:09 zhaisitong

Good question Sitong. I don't think the failing tests were integrated into CI yet (as they were failing), so they don't run automatically.

@sfkhalid can you please point Sitong at the failed regression tests and give steps on how to reproduce the failures? Adding @MohamedElgammal in case he knows how to run / reproduce these tests too.

vaughnbetz avatar Sep 06 '22 19:09 vaughnbetz

I am not sure which tests are currently failing. However, I will attach a conversation I had with Sara before in which she pointed me how to manually run floorplanning tests

"The tests should be run with the following commands:

Test 1:

../vtr-verilog-to-routing/vpr/vpr ../stratixiv_arch.timing.xml ../vtr-verilog-to-routing/vtr_flow/benchmarks/titan_blif/neuron_stratixiv_arch_timing.blif --route_chan_width 300 --max_router_iterations 400 --router_lookahead map --initial_pres_fac 1.0 --router_profiler_astar_fac 1.5 --seed 3 --sdc_file ../vtr-verilog-to-routing/vtr_flow/benchmarks/titan_blif/neuron_stratixiv_arch_timing.sdc --read_vpr_constraints sixteenth.xml --device neuron

Test 2:

../vtr-verilog-to-routing/vpr/vpr ../stratixiv_arch.timing.xml ../vtr-verilog-to-routing/vtr_flow/benchmarks/titan_blif/neuron_stratixiv_arch_timing.blif --route_chan_width 300 --max_router_iterations 400 --router_lookahead map --initial_pres_fac 1.0 --router_profiler_astar_fac 1.5 --seed 3 --sdc_file ../vtr-verilog-to-routing/vtr_flow/benchmarks/titan_blif/neuron_stratixiv_arch_timing.sdc --read_vpr_constraints one_big_partition.xml --device neuron

Test 3:

../vtr-verilog-to-routing/vpr/vpr ../stratixiv_arch.timing.xml ../vtr-verilog-to-routing/vtr_flow/benchmarks/titan_blif/neuron_stratixiv_arch_timing.blif --route_chan_width 300 --max_router_iterations 400 --router_lookahead map --initial_pres_fac 1.0 --router_profiler_astar_fac 1.5 --seed 3 --sdc_file ../vtr-verilog-to-routing/vtr_flow/benchmarks/titan_blif/neuron_stratixiv_arch_timing.sdc --read_vpr_constraints half_blocks_half.xml --device neuron

A few things to note:

vtr-verilog-to-routing is the VTR root directory in these commands. Please use the architecture file I've attached to this email when running VPR. It has the device size specification that is needed for the --device command I've attached the three constraints files used in the tests to this email - sixteenth.xml, one_big_partition.xml, and half_blocks_half.xml " Archive.zip

MohamedElgammal avatar Sep 06 '22 20:09 MohamedElgammal

@MohamedElgammal I didn't find the architecture file in the "Archive.zip". I tried the stratixiv_arch.xml in the vtr_flow, but it doesn't work. Could you upload the architecture file again?

zhaisitong avatar Sep 07 '22 20:09 zhaisitong

@zhaisitong Kindly find attached the updated architecture file stratixiv_arch.timing.xml.zip

MohamedElgammal avatar Sep 08 '22 15:09 MohamedElgammal

@MohamedElgammal Thanks for your architecture file. Which branch and commit id could I use to reproduce this problem? Could I use the newest codes on the master branch?

zhaisitong avatar Sep 08 '22 16:09 zhaisitong

@zhaisitong, some useful docs: https://docs.verilogtorouting.org/en/latest/vpr/placement_constraints/?highlight=placement%20constraints#vpr-placement-constraints Will also email Sarah's thesis.

vaughnbetz avatar Sep 08 '22 17:09 vaughnbetz

@sfkhalid : can you point Sitong at a working branch or commit so he can see what the code should do?

vaughnbetz avatar Sep 08 '22 17:09 vaughnbetz

@MohamedElgammal When did you merge your API for sfkhalid's codes? Could the codes work after you merged your API? Could you specify me the commit id of your merge?

zhaisitong avatar Sep 09 '22 18:09 zhaisitong

@zhaisitong I didn't merge my API into Sarah's code. I did some refactoring for the original packing code and the first function of my API into master in PR #2034 . Then, Sarah merged these changes into her branch and pushed it into master. Finally, I have added the remaining parts of my API in follow-up PRs #2044 , #2051

MohamedElgammal avatar Sep 09 '22 19:09 MohamedElgammal

I traced back and found this problem comes from @jmah76 committed on Jun 8, the commit id is d22193d4c8030a24daa73884b60d9bdb8d0aec0b. I think @MohamedElgammal had pushed all the relevant code and worked at that time?

zhaisitong avatar Sep 11 '22 00:09 zhaisitong

@zhaisitong Yes, all the refactoring commits were merged into master on May 27

MohamedElgammal avatar Sep 11 '22 13:09 MohamedElgammal

That commit is just a command line description update; it's not clear to me how that could cause a problem. Have you confirmed the code just before that passes the tests?

vaughnbetz avatar Sep 11 '22 16:09 vaughnbetz

@vaughnbetz Yes, it is weird. But I run 5 times, the results are the same. When I checked out to each commits, I found the codes are very different. I am still comparing them.

zhaisitong avatar Sep 11 '22 16:09 zhaisitong

I think this might be because @jmah76 forgot to update codes before pushing them. After I recover all relevant codes, the three commands mentioned above works.

Above is incorrect, this problem is because of other reasons!

zhaisitong avatar Sep 12 '22 03:09 zhaisitong

The problem might come from a14647570ac98df63b1871508a73496b64fbf31f. In function vpr/src/pack/cluster_util.cpp: get_highest_gain_molecule, before finding unpacked molecules based on attraction group of the current cluster, missing the condition "cur_pb->pb_stats->num_feasible_blocks == 0".

After adding this condition, the above three cases work.

zhaisitong avatar Sep 13 '22 04:09 zhaisitong

This issue has been solved with PR #2160 .

zhaisitong avatar Sep 13 '22 16:09 zhaisitong

@zhaisitong @vaughnbetz Thanks for following up on this :)

ArashAhmadian avatar Sep 13 '22 16:09 ArashAhmadian