Require a minimum number of volumes to build a BIH tree
Analysis of the tilecal problem confirmed that the BIH is causing a 3x slowdown. Profiling revealed that this is due to increased branching (fewer active threads per warp) which leads to worse memory performance. The tilecal problem itself has a few universes with more than 100 volumes, but many more with fewer than 20. This MR adds a bih_min_size parameter that causes the BIH to only be built for universes with a sufficiently large number of volumes. If a universe has too few volumes, all volumes will be placed into "non-tree volumes," which was previously reserved for volumes with infinite bounding boxes. This results in the following performance on Wildstyle:
| Commit | Tilecal runtime on V100 (s) |
|---|---|
| before BIH was merged: 09cbf5ba999 | 49.5 |
| after BIH was merged: acf500714c27 | 146 |
| develop: a4c469f7d42f | 146 |
| this MR: c37d4d376c3 | 64.6 |
Likewise, this MR provides an 88% increase in active threads per warp w.r.t. develop, as well as the following improved memory behavior (also w.r.t. develop):
This MR is a draft for two reasons:
bih_min_sizeis a tuning parameter. We need to come up with a better system for incorporating these in the code.- It is not clear what to tune this parameter to. Obviously a single test on a V100 is not sufficient. Would the entire suite on an MI250X be sufficient? Can we define and overall FOM for the suite, in the event that performance gains for one problem are accompanied by performance losses in another?
Once we resolve these issues I will update this MR accordingly.
Test summary
466 files 727 suites 23s :stopwatch: 1 169 tests 1 150 :white_check_mark: 14 :zzz: 5 :x: 2 358 runs 2 340 :white_check_mark: 8 :zzz: 10 :x:
For more details on these failures, see this check.
Results for commit c37d4d37.
Draft until this is refactor to instead set a maximum leaf size
Closing this MR; the minimum leaf size approach used here is superior: #1930
Determining this parameter will done with benchmark problems on ANL JLSE per https://github.com/celeritas-project/regression/issues/3.