vtr-verilog-to-routing icon indicating copy to clipboard operation
vtr-verilog-to-routing copied to clipboard

VPR Placer runtime issue when all design clusters have fixed locations

Open rachelselinar opened this issue 6 months ago • 6 comments

VPR Placer has a large runtime when all the input design clusters have fixed locations.

Expected Behaviour

When all clusters are fixed, VPR's place stage should complete very fast as all the clusters are already placed.

Current Behaviour

When all clusters have fixed locations, VPR's placer takes very long time to complete. In contrast, when only IO/PLL clusters are fixed, VPR's placer using annealing completes much faster.

Example / Steps to Reproduce

Used the Titan23 benchmarks and all designs have similar behavior. Sharing example from 'gaussianblur' design to show how large the runtime difference can be.

(i) All cluster locations are fixed

Command: _

vpr gaussianblur.400x296.stratixiv_arch.timing.xml gaussianblur.blif --place --route --timing_analysis on --route_chan_width 300 --max_router_iterations 400 --astar_fac 1 --verify_file_digests off --timing_report_detail aggregated --device 400x296 --sdc_file gaussianblur.sdc --fix_clusters gaussianblur.fix_clusters --net_file gaussianblur.net

_

Placement Snippet from vpr_stdout.log:

Moves per temperature: 2498181 Warning 567: Starting t: 0 of 105710 configurations accepted.


Tnum Time T Av Cost Av BB Cost Av TD Cost CPD sTNS sWNS Ac Rate Std Dev R lim Crit Exp Tot Moves Alpha (sec) (ns) (ns) (ns)


1 106145.3 0.0e+00 1.000 90098.80 0.00054943 794.866 -7.11e+07 -793.866 0.000 0.0000 399.0 1.00 2498181 0.200 2 66540.4 0.0e+00 1.000 90098.80 0.00054943 794.866 -7.11e+07 -793.866 0.000 0.0000 399.0 1.00 4996362 0.950

Placement Quench took 66540.41 seconds (max_rss 22758.9 MiB)

post-quench CPD = 794.866 (ns)

BB estimate of min-dist (placement) wire length: 27029640

Completed placement consistency check successfully.

(Ii) Only IO/PLL cluster locations are fixed:

Command: _

vpr gaussianblur.400x296.stratixiv_arch.timing.xml gaussianblur.blif --init_place_file gaussianblur.place --fix_clusters gaussianblur.io_pll.fix_clusters --place --route --timing_analysis on --route_chan_width 300 --max_router_iterations 400 --astar_fac 1 --verify_file_digests off --timing_report_detail aggregated --device 400x296 --sdc_file gaussianblur.sdc --net_file gaussianblur.net

_

Placement Snippet from vpr_stdout.log:

Moves per temperature: 2498181 Warning 567: Starting t: 801 of 105710 configurations accepted.


Tnum Time T Av Cost Av BB Cost Av TD Cost CPD sTNS sWNS Ac Rate Std Dev R lim Crit Exp Tot Moves Alpha (sec) (ns) (ns) (ns)


1 123.9 1.4e-04 6.004 846618.76 0.0014513 799.617 -8.1e+07 -798.617 0.858 1.6785 399.0 1.00 2498181 0.200 2 126.7 1.2e-04 1.065 1045081.66 0.0010934 2501.707 -3.03e+08 -2500.707 0.972 0.0137 399.0 1.00 4996362 0.900 3 128.3 6.2e-05 0.973 1042872.91 0.0012203 3923.444 -3.05e+08 -3922.444 0.954 0.0036 399.0 1.00 7494543 0.500 4 129.4 5.6e-05 0.977 1040438.53 0.0010769 3697.168 -3.04e+08 -3696.168 0.948 0.0042 399.0 1.00 9992724 0.900 5 130.3 5.0e-05 0.988 1038464.20 0.0011874 3238.814 -3.27e+08 -3237.814 0.944 0.0021 399.0 1.00 12490905 0.900 6 130.2 4.5e-05 0.978 1035637.29 0.0010805 3219.246 -3.18e+08 -3218.246 0.941 0.0032 399.0 1.00 14989086 0.900 7 132.0 4.1e-05 0.983 1035091.99 0.0012707 3066.179 -3.16e+08 -3065.179 0.933 0.0023 399.0 1.00 17487267 0.900 8 132.2 3.7e-05 0.983 1033164.71 0.0011694 2955.221 -3.13e+08 -2954.221 0.928 0.0023 399.0 1.00 19985448 0.900 9 134.1 3.3e-05 0.980 1031057.35 0.0011307 2851.145 -2.93e+08 -2850.145 0.921 0.0034 399.0 1.00 22483629 0.900 10 134.6 3.0e-05 0.986 1029670.86 0.0012626 2423.375 -3.19e+08 -2422.375 0.915 0.0020 399.0 1.00 24981810 0.900 ... 100 101.9 1.8e-07 0.998 105965.24 1.9476e-05 783.091 -1.03e+08 -782.091 0.318 0.0006 2.7 7.97 249818100 0.950 101 101.5 1.7e-07 0.998 105678.35 1.9439e-05 783.138 -1.02e+08 -782.138 0.302 0.0005 2.3 7.98 252316281 0.950 102 101.3 1.6e-07 0.999 105452.23 1.9483e-05 782.958 -1.02e+08 -781.958 0.284 0.0005 2.0 7.98 254814462 0.950 103 101.6 1.5e-07 0.998 105173.43 1.9406e-05 783.111 -1.02e+08 -782.111 0.430 0.0005 1.7 7.99 257312643 0.950 104 101.5 1.5e-07 0.999 104988.39 1.9386e-05 782.943 -1.02e+08 -781.943 0.412 0.0003 1.7 7.99 259810824 0.950 105 100.9 1.4e-07 0.999 104812.00 1.8895e-05 783.019 -1.02e+08 -782.019 0.397 0.0003 1.6 7.99 262309005 0.950 106 101.0 1.3e-07 0.999 104640.99 1.9355e-05 783.120 -1.02e+08 -782.120 0.379 0.0003 1.6 7.99 264807186 0.950 107 101.6 1.3e-07 0.999 104484.99 1.8941e-05 782.887 -1.02e+08 -781.887 0.364 0.0003 1.5 7.99 267305367 0.950 108 101.3 1.2e-07 0.999 104334.25 1.9334e-05 783.284 -1.02e+08 -782.284 0.346 0.0003 1.4 7.99 269803548 0.950 109 100.9 1.1e-07 0.999 104198.63 1.9255e-05 782.902 -1.02e+08 -781.902 0.329 0.0003 1.2 8.00 272301729 0.950 110 100.8 1.1e-07 0.999 104060.84 1.937e-05 783.105 -1.02e+08 -782.105 0.312 0.0003 1.1 8.00 274799910 0.950 111 100.7 1.0e-07 0.999 103926.18 1.9347e-05 782.882 -1.02e+08 -781.882 0.296 0.0002 1.0 8.00 277298091 0.950 112 100.2 9.8e-08 0.999 103813.07 1.9351e-05 783.188 -1.01e+08 -782.188 0.281 0.0002 1.0 8.00 279796272 0.950 113 100.0 9.3e-08 0.999 103700.57 1.9312e-05 783.090 -1.02e+08 -782.090 0.265 0.0002 1.0 8.00 282294453 0.950 114 99.5 8.8e-08 0.999 103600.40 1.9289e-05 783.259 -1.02e+08 -782.259 0.251 0.0002 1.0 8.00 284792634 0.950 115 99.3 8.4e-08 0.999 103496.22 1.9182e-05 782.929 -1.02e+08 -781.929 0.237 0.0002 1.0 8.00 287290815 0.950 116 99.2 7.9e-08 0.999 103395.91 1.9239e-05 783.180 -1.02e+08 -782.180 0.221 0.0002 1.0 8.00 289788996 0.950 117 99.0 7.6e-08 1.000 103315.76 1.9263e-05 782.944 -1.02e+08 -781.944 0.207 0.0002 1.0 8.00 292287177 0.950 118 98.7 7.2e-08 1.000 103233.56 1.9216e-05 783.056 -1.02e+08 -782.056 0.194 0.0002 1.0 8.00 294785358 0.950 119 98.8 6.8e-08 1.000 103162.01 1.9174e-05 783.219 -1.02e+08 -782.219 0.181 0.0002 1.0 8.00 297283539 0.950 120 98.6 6.5e-08 1.000 103093.70 1.9185e-05 783.123 -1.03e+08 -782.123 0.169 0.0001 1.0 8.00 299781720 0.950 121 98.6 6.2e-08 1.000 103027.63 1.9178e-05 783.001 -1.02e+08 -782.001 0.158 0.0001 1.0 8.00 302279901 0.950 122 98.3 5.8e-08 1.000 102973.50 1.9192e-05 783.163 -1.02e+08 -782.163 0.146 0.0001 1.0 8.00 304778082 0.950 123 97.7 4.7e-08 0.999 102855.63 1.913e-05 782.973 -1.02e+08 -781.973 0.111 0.0002 1.0 8.00 307276263 0.800 Agent's 2nd state: Checkpoint saved: bb_costs=102820, TD costs=1.92158e-05, CPD=782.944 (ns) 124 96.8 3.7e-08 1.000 102746.76 1.9211e-05 782.944 -1.02e+08 -781.944 0.064 0.0002 1.0 8.00 309774444 0.800 125 96.6 3.0e-08 1.000 102678.19 1.9135e-05 783.157 -1.02e+08 -782.157 0.046 0.0001 1.0 8.00 312272625 0.800 126 96.6 2.4e-08 1.000 102633.81 1.9131e-05 782.944 -1.02e+08 -781.944 0.034 0.0001 1.0 8.00 314770806 0.800 127 96.2 1.9e-08 1.000 102604.47 1.9148e-05 783.063 -1.02e+08 -782.063 0.025 0.0001 1.0 8.00 317268987 0.800 128 96.1 1.5e-08 1.000 102590.45 1.9118e-05 782.944 -1.02e+08 -781.944 0.019 0.0000 1.0 8.00 319767168 0.800 Checkpoint saved: bb_costs=102588, TD costs=1.91182e-05, CPD=782.939 (ns) 129 96.0 1.2e-08 1.000 102581.34 1.9117e-05 782.939 -1.02e+08 -781.939 0.015 0.0000 1.0 8.00 322265349 0.800 130 95.9 9.8e-09 1.000 102575.79 1.9107e-05 783.129 -1.02e+08 -782.129 0.011 0.0000 1.0 8.00 324763530 0.800 Checkpoint saved: bb_costs=102576, TD costs=1.91221e-05, CPD=782.878 (ns) 131 96.0 7.8e-09 1.000 102572.58 1.9121e-05 782.878 -1.02e+08 -781.878 0.009 0.0000 1.0 8.00 327261711 0.800 132 95.9 6.3e-09 1.000 102571.26 1.9109e-05 783.129 -1.02e+08 -782.129 0.007 0.0000 1.0 8.00 329759892 0.800 133 95.7 5.0e-09 1.000 102569.54 1.9121e-05 782.939 -1.02e+08 -781.939 0.006 0.0000 1.0 8.00 332258073 0.800 134 95.6 0.0e+00 1.000 102569.82 1.9114e-05 783.129 -1.02e+08 -782.129 0.001 0.0000 1.0 8.00 334756254 0.800

Placement Quench took 95.62 seconds (max_rss 22746.4 MiB)

post-quench CPD = 782.939 (ns)

Checkpoint restored

BB estimate of min-dist (placement) wire length: 30772745

Completed placement consistency check successfully.

Inputs for the above example can be found in this shared drive.

Context

Trying to compare 'no refinement' vs 'refinement' of cluster locations in VPR.

Your Environment

  • VPR used: https://github.com/verilog-to-routing/vtr-verilog-to-routing/commit/eb3c95d
  • Operating System and version: Ubuntu 20.04.6 LTS

rachelselinar avatar Feb 05 '24 17:02 rachelselinar

Thank you for opening this issue.

It seems that some move generators call pick_from_block() function to select a block randomly. This function has a while loop that tries to find a movable block. Since all blocks in your run are fixed, this loop's exit condition is not met until all possible options (all clustered blocks) are exhaused.

https://github.com/verilog-to-routing/vtr-verilog-to-routing/blob/451fb4dba04a535be9a9c3845f5e4b11c92fae95/vpr/src/place/move_utils.cpp#L595-L626

If you want to skip placement altogether, I guess you can pass the placement file using --place_file option.

@vaughnbetz What is you opinion on this? Should I change pick_from_block() function? We can fix this by trying only a few clustered blocks to find a movable one. Alternatively, we can store all movable blocks in a separate container and select an element of this container randomly.

soheilshahrouz avatar Feb 06 '24 14:02 soheilshahrouz

Thanks @soheilshahrouz . Probably we should put all the movable blocks (and only movable blocks) in a container and select them randomly, so we are always efficient.

vaughnbetz avatar Feb 06 '24 16:02 vaughnbetz

I was able to convert a .fix_clusters file to a .place file by including '0' subtile entry for all entries and appending netlist checksum and array information as a header.

By passing the .place file using the --place_file option, router ran successfully (and much faster) and the results are the same.

Thank you

rachelselinar avatar Feb 12 '24 16:02 rachelselinar

Great, thanks Rachel. I think without the checksum it will just give a warning, so if that's a pain to maintain I think you can skip it (it's intended to protect people from accidentally using a .place file for the wrong circuit or architecture).

vaughnbetz avatar Feb 12 '24 16:02 vaughnbetz

I was able to obtain the netlist checksum and array size information from runs that did not fix all clusters in the .fix_clusters file. In addition, VPR placer errors out if the .place file doesn't have these 3 lines in its header

  1. Netlist checksum
  2. Array size

Without either (or 3 empty lines as header), placer errors out at the first entry: _

Error 1: Type: Placement file File: ../gaussianblur.place Line: 3 Message: Invalid line 'step0:grp_step0_fu_168|step0_grp_fu_2167_ACMP_fmul_6:step0_grp_fu_2167_ACMP_fmul_6_U|ACMP_fmul:ACMP_fmul_U|AESL_WP_FMul:ACMP_FMul_U|lpm_mult:Mult0|mult_8at:auto_generated|mac_mult2 166 126 0 0 #0' in placement file header

_

  1. Empty line

If only the checksum and array size is provided without an empty line, the placer errors out:

_

Error 1: Type: Placement File: ~/vtr-verilog-to-routing/vpr/src/base/read_place.cpp Line: 297 Message: Block 0 has not been read from the place file.

_

For gaussianblur, the .place file generated by VPR placer contains this header:

Netlist_File: gaussianblur.net Netlist_ID: SHA256:d6e68433c9959dc959dc0758f8f9f026cd79b00d8fbbfce6c211b820e50589dd Array size: 400 x 296 logic blocks

#block name x y subblk layer block number #---------- -- -- ------ ----- ------------

rachelselinar avatar Feb 12 '24 16:02 rachelselinar

Thanks; we should probably make this a bit more robust (keep parsing and give a warning).

vaughnbetz avatar Feb 12 '24 19:02 vaughnbetz