ArborX icon indicating copy to clipboard operation
ArborX copied to clipboard

Implement occupancy for self-collision

Open aprokop opened this issue 3 years ago • 5 comments

aprokop avatar Jan 06 '23 18:01 aprokop

Summit

Using Kokkos develop (f2da62d0e).

summit_results.zip

DBSCAN (GeoLife, minPts = 2, eps = 1e-4)

default   0.120
10        0.157
20        0.105
30        0.105
40        0.095
50        0.096
60        0.094
70        0.096
80        0.098
90        0.097
100       0.121

DBSCAN (HACC 37M, minPts = 5, eps = 0.042)

default   0.219
10        0.228
20        0.146
30        0.146
40        0.125
50        0.125
60        0.125
70        0.131
80        0.131
90        0.131
100       0.220

DBSCAN (HACC 37M, minPts = 2, eps = 0.042)

default   0.170
10        0.220
20        0.141
30        0.142
40        0.122
50        0.117
60        0.117
70        0.121
80        0.119
90        0.119
100       0.165

DBSCAN (uniform100M3, minPts = 5, eps = 0.002)

default   0.262
10        0.310
20        0.209
30        0.209
40        0.190
50        0.192
60        0.192
70        0.203
80        0.203
90        0.203
100       0.262

Molecular dynamics (100^3)

Count (ArborX::Experimental::HalfNeighborList::Count)

default   4.37e-02
10        3.81e-02
20        3.26e-02
30        3.26e-02
40        3.22e-02
50        3.24e-02
60        3.25e-02
70        3.28e-02
80        3.27e-02
90        3.27e-02
100       3.91e-02

Fill (ArborX::Experimental::HalfNeighborList::Fill)

default   1.27e-01
10        3.75e-02
20        3.09e-02
30        3.09e-02
40        4.33e-02
50        6.30e-02
60        6.28e-02
70        8.14e-02
80        8.19e-02
90        8.10e-02
100       1.24e-01

aprokop avatar Jan 06 '23 18:01 aprokop

To emphasize the previous results for molecular dynamics.

Default occupancy [2x speedup]

|-> 3.78e-01 sec 41.8% 96.3% 0.0% 1.2% 4.76e+01 1 classic [region]
|-> 2.53e-01 sec 28.0% 78.4% 0.0% 1.3% 8.70e+01 1 half+expand [region]
|-> 1.81e-01 sec 20.0% 93.1% 0.0% 0.7% 9.39e+01 1 full [region]

30% occupancy [5x speedup !!!]

|-> 3.79e-01 sec 54.9% 96.2% 0.0% 1.2% 4.75e+01 1 classic [region]
|-> 1.43e-01 sec 20.8% 64.0% 0.0% 2.2% 1.53e+02 1 half+expand [region]
|-> 7.64e-02 sec 11.1% 83.6% 0.0% 1.6% 2.23e+02 1 full [region]

It seems that for the fill-in kernels, it really is desirable to keep the occupancy really low. For the lighter kernels (counts, union-find) it is not as important. And to be fair, we can probably accelerate the original kernel (classic) using some occupancy.

I still struggle to figure out how we go about it. Should we make it a value that we can provide when calling half traversal?

Or maybe a better way is to make it tunable, so that a user can run a kernel few times and that would spit out the best value.

aprokop avatar Jan 06 '23 19:01 aprokop

Just recording that I tried a similar trick with a regular spatial query for FDBSCAN-DenseBox, and have not observed any improvement (4%). Running HACC 497M (first 150M points), eps = 0.014, minpts = 2

default   0.886
10        1.689
20        1.085
30        1.085
40        0.925
50        0.870
60        0.871
70        0.850
80        0.850
90        0.850
100       0.886

aprokop avatar Jan 13 '23 22:01 aprokop

Initial results by @khuck using automated tuning with APEX are promising. For the standard HACC 37M problem, the tool converges on the 70-90 occupancy value that is in line with our experience.

PastedGraphic-1

It’s pretty clear that the kernel times get shorter as the occupancy goes up to ~90 like you said, then the kernel times have a small bump in the range 90-100. After all values are tested, it converged on 70 for this case. In this example, I am using a tuning “window” of 5, so each occupancy value is tested 5 times and the minimum value is recorded as the response to the setting. This way I can account for any system noise that might confuse the search, but it does make the search 5x longer (500 of the 600 total iterations). The simulated annealing algorithm requires many more tests before converging, so that search algorithm probably isn’t relevant for this case without some parameter tweaking.

aprokop avatar Feb 02 '24 23:02 aprokop