Implement Bounding Box Hierarchy in `Path`
Split from https://github.com/ppy/osu-framework/pull/6613 which serves as base implementation of BBH inside a Path. By it's own this pr only improves Path.ReceivePositionalInputAt performance.
Structure
When passing path vertices to the PathBBH we will be building a binary tree (from bottom to top), in which most-bottom row will contain path segments and their bounding boxes. Then we will update all the parent nodes with combined bounds of left and right children.
Of course not all paths have 2^n segments, so we need to create such a tree which will be able to scale to any amount of segments without having empty nodes to reduce memory needed to store them. All the nodes are stored in an array of size roughly ~2n segment count in the following order: [nodes from the 0 depth from left to right][nodes from the 1 depth from left to right]...[nodes from the n-1 depth from left to right][leafs(segments)]. Array will be populated from end to start starting with leafs and then their parents and their parents and so on. And in the end the whole tree is built in just 1 cycle.
Path.SetVertices benchmark
master:
| Method | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|
| Compute100Segments | 592.5 ns | 1.81 ns | 1.69 ns | - |
| Compute1KSegments | 5,887.3 ns | 110.78 ns | 98.20 ns | - |
| Compute10KSegments | 59,171.8 ns | 127.50 ns | 113.03 ns | - |
| Compute100KSegments | 569,454.3 ns | 2,341.02 ns | 2,075.25 ns | - |
| Compute1MSegments | 5,895,614.8 ns | 22,614.86 ns | 20,047.48 ns | - |
pr:
| Method | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|
| Compute100Segments | 2.234 us | 0.0027 us | 0.0021 us | 56 B |
| Compute1KSegments | 22.013 us | 0.0453 us | 0.0424 us | 56 B |
| Compute10KSegments | 365.738 us | 2.3118 us | 2.0493 us | 56 B |
| Compute100KSegments | 4,013.083 us | 7.9266 us | 7.4145 us | - |
| Compute1MSegments | 41,526.195 us | 65.9597 us | 61.6988 us | - |
pr (after https://github.com/ppy/osu-framework/pull/6613/commits/06d895dcd3e8c719e0e7b0b4c359fa1aac092a64):
| Method | Mean | Error | StdDev | Gen0 | Allocated |
|---|---|---|---|---|---|
| Compute100Segments | 1.508 us | 0.0053 us | 0.0049 us | 0.0019 | 56 B |
| Compute1KSegments | 15.007 us | 0.0308 us | 0.0288 us | - | 56 B |
| Compute10KSegments | 152.082 us | 0.5901 us | 0.5231 us | - | 56 B |
| Compute100KSegments | 1,535.200 us | 2.7969 us | 2.6163 us | - | 57 B |
| Compute1MSegments | 21,892.779 us | 80.7825 us | 63.0697 us | - | - |
While pr values are about ~2.5x slower (with more micro-optimisations can be improved further, target would be 2x given amount of nodes processed is ~2x segment count), it's worth noting that in case of master (with snaking sliders) we will see these timings each frame and with https://github.com/ppy/osu-framework/pull/6613 - only once, and then timings from Path.SetStartProgress table (which are basically nothing)
Path.ReceivePositionalInputAt benchmark
master:
| Method | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|
| Contains100 | 302.0 ns | 0.63 ns | 0.56 ns | - |
| Contains1K | 2,906.9 ns | 12.67 ns | 11.85 ns | - |
| Contains10K | 76,607.3 ns | 267.95 ns | 250.64 ns | - |
| Contains100K | 868,608.0 ns | 2,824.05 ns | 2,641.61 ns | - |
| Contains1M | 8,683,885.7 ns | 106,430.14 ns | 99,554.82 ns | - |
pr:
| Method | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|
| Contains100 | 195.1 ns | 0.20 ns | 0.19 ns | - |
| Contains1K | 289.8 ns | 0.21 ns | 0.20 ns | - |
| Contains10K | 444.2 ns | 0.41 ns | 0.37 ns | - |
| Contains100K | 594.2 ns | 1.17 ns | 1.04 ns | - |
| Contains1M | 945.1 ns | 2.87 ns | 2.54 ns | - |
Input performance showcase
| master | pr |
|---|---|
| https://github.com/user-attachments/assets/32cfe1e8-d9b3-4ace-a200-8ab1f85179b3 | https://github.com/user-attachments/assets/f46a9a52-e76c-40df-abee-59fb23ead737 |