gpl: slowdown when activating multi-threading
Description
This issue was discovered while addressing non-determinism with GPL when multi-threading was activated (https://github.com/The-OpenROAD-Project/OpenROAD/issues/5360).
I tested the GPL runtime using secure-CI (multiple designs) and locally via a GCP server with Nangate45/Swerv. The tests indicate that activating multi-threading results in the same runtime or, in some cases, even slows down GPL.
I believe the slowdown is due to too many context switches with minimal gain in parallelism, as the loops that use multi-threading typically perform simple mathematical operations.
- It was mentioned that the OMP parallel loops inserted in GPL were implemented by Antmicro developers, who claimed they improved runtime. TODO: Investigate the code added using GitHub blame; are there any runtime reports available?
- Investigate which loops are causing the slowdown. Measure runtime.
Suggested Solution
No response
Additional Context
No response
I suggest you look at the PR as they often include data there. There may have been multiple changes.
This is the PR: https://github.com/The-OpenROAD-Project/OpenROAD/pull/4580.
Hi @kbieganski, do you still see improvements in runtime for gpl with MT?
It could be the case that the MT only helps on larger designs, and is a net negative on designs below a certain size threshold.
I haven't observed a slowdown caused by multi-threading on 3_3_place_gp in ORFS:
ibex
| Threads | Min [s] | Max [s] | Median [s] | Relative median |
|---|---|---|---|---|
| 1 | 49.90 | 51.29 | 51.05 | 1.04 |
| 8 | 48.06 | 49.41 | 48.88 | 1.00 |
ariane133
| Threads | Min [s] | Max [s] | Median [s] | Relative median |
|---|---|---|---|---|
| 1 | 930.71 | 940.08 | 935.91 | 1.29 |
| 8 | 719.30 | 727.28 | 725.57 | 1.00 |
black_parrot
| Threads | Min [s] | Max [s] | Median [s] | Relative median |
|---|---|---|---|---|
| 1 | 1417.57 | 1432.21 | 1424.02 | 1.25 |
| 8 | 1134.33 | 1146.48 | 1139.92 | 1.00 |
This is on a version of OpenROAD from June.
I believe the slowdown is due to too many context switches
If you're observing too many context switches, perhaps you're running more threads than physical cores? Set your -threads (NUM_CORES in ORFS) to the number of physical cores you have. By default it uses nproc which, unfortunately, reports the logical cores.