ArborX icon indicating copy to clipboard operation
ArborX copied to clipboard

Switch HalfTraversal to APIv2

Open aprokop opened this issue 2 years ago • 4 comments

The main thing is to change half-traversal callback from callback(int, int) to callback(value, value).

Note that this change is not backwards compatible. However, HalfTraversal is experimental, so it should be fine.

aprokop avatar Jan 10 '24 19:01 aprokop

I need to check that FDBSCAN is not affected. We don't have benchmark for neighbor lists, so can't say how much it affected it. But I expect that there is some benefit as I switched to APIv2 with points instead of boxes for the tree construction.

aprokop avatar Jan 10 '24 20:01 aprokop

saturn dbscan $ ./ArborX_master_7abc0268 --binary --filename
  ~/data/hacc_497M.arborx --eps 0.014 --core-min-size 2 --impl fdbscan --verbose
-- construction     :      0.371
-- query+cluster    :      3.143
-- postprocess      :      0.516
total time          :      4.106
-----------------------------------------------
saturn dbscan $ ./ArborX_half_cfb2d3e2 --binary --filename
  ~/data/hacc_497M.arborx --eps 0.014 --core-min-size 2 --impl fdbscan --verbose
-- construction     :      0.375
-- query+cluster    :      3.358
-- postprocess      :      0.529
total time          :      4.340

About 6% slowdown in query+cluster on A100 for 497M problem (eps=0.014, minPts=2).

saturn dbscan $ ./ArborX_master_7abc0268 --binary --filename
  ~/data/hacc_497M.arborx --eps 0.014 --core-min-size 10 --impl fdbscan --verbose
-- construction     :      0.379
-- query+cluster    :      4.487
---- neigh          :      0.836
---- query          :      3.408
-- postprocess      :      0.423
total time          :      5.368
-----------------------------------------------
saturn dbscan $ ./ArborX_half_cfb2d3e2 --binary --filename
  ~/data/hacc_497M.arborx --eps 0.014 --core-min-size 10 --impl fdbscan --verbose
-- construction     :      0.376
-- query+cluster    :      4.796
---- neigh          :      0.843
---- query          :      3.701
-- postprocess      :      0.407
total time          :      5.657

About 8.5% slowdown in query on A100 for 497M problem (eps=0.014, minPts=2).

The cost is likely passing values by const ref to the FDBSCANCallback, and then unwrapping it. I don't know how I feel about it. On one hand, that's a pretty hefty penalty for such a small change. On the other, the interface for half traversal seems to be more flexible.

Maybe need to explore other cases where half traversal is useful but maybe not possible with the current callback interface.

aprokop avatar Jan 11 '24 15:01 aprokop

Move a part of the PR (the one that isn't related to switching HalfTraversal API to v2) to #1024.

aprokop avatar Jan 11 '24 19:01 aprokop

Rebased on #1024.

aprokop avatar Jan 12 '24 15:01 aprokop

The following change seems to have fixed the performance issue:

--- a/src/details/ArborX_DetailsHalfTraversal.hpp
+++ b/src/details/ArborX_DetailsHalfTraversal.hpp
@@ -52,7 +52,7 @@ struct HalfTraversal

   KOKKOS_FUNCTION void operator()(int i) const
   {
-    auto const &leaf_value = HappyTreeFriends::getValue(_bvh, i);
+    auto const leaf_value = HappyTreeFriends::getValue(_bvh, i);
     auto const predicate = _get_predicate(leaf_value);

     int node = HappyTreeFriends::getRope(_bvh, i);

While it may have negative consequences for very large size values, this is acceptable for our common use cases. In the future, we may specialize on sizeof(Value).

saturn dbscan ((fd39c111...)) $ for i in ./ArborX_master_466efb7c ./ArborX_half_fd39c111; do $i --binary --filename ~/data/hacc_497M.arborx --eps 0.014 --core-min-size 2 --impl fdbscan --verbose --kokkos-device-id=2 --max-num-points 300000000; done
ArborX version    : 1.6 (dev)
ArborX hash       : 466efb7c
Kokkos version    : 4.1.0
algorithm         : dbscan
eps               : 0.014000
cluster min size  : 1
implementation    : fdbscan
verify            : false
minpts            : 2
filename          : /home/users/aprokop/data/hacc_497M.arborx [binary, max_pts = 300000000]
samples           : -1
verbose           : true
Reading in "/home/users/aprokop/data/hacc_497M.arborx" in binary mode...done
Read in 300000000 3D points
-- construction     :      0.207
-- query+cluster    :      1.894
-- postprocess      :      0.202
total time          :      2.308

#clusters       : 18287721
#cluster points : 226764477 [75.59%]
#noise   points : 73235523 [24.41%]
ArborX version    : 1.6 (dev)
ArborX hash       : fd39c111
Kokkos version    : 4.1.0
algorithm         : dbscan
eps               : 0.014000
cluster min size  : 1
implementation    : fdbscan
verify            : false
minpts            : 2
filename          : /home/users/aprokop/data/hacc_497M.arborx [binary, max_pts = 300000000]
samples           : -1
verbose           : true
Reading in "/home/users/aprokop/data/hacc_497M.arborx" in binary mode...done
Read in 300000000 3D points
-- construction     :      0.242
-- query+cluster    :      1.882
-- postprocess      :      0.203
total time          :      2.332

#clusters       : 18287721
#cluster points : 226764477 [75.59%]
#noise   points : 73235523 [24.41%]
saturn dbscan ((fd39c111...)) $ for i in ./ArborX_master_466efb7c ./ArborX_half_fd39c111; do $i --binary --filename ~/data/hacc_497M.arborx --eps 0.014 --core-min-size 10 --impl fdbscan --verbose --kokkos-device-id=2 --max-num-points 300000000; done
ArborX version    : 1.6 (dev)
ArborX hash       : 466efb7c
Kokkos version    : 4.1.0
algorithm         : dbscan
eps               : 0.014000
cluster min size  : 1
implementation    : fdbscan
verify            : false
minpts            : 10
filename          : /home/users/aprokop/data/hacc_497M.arborx [binary, max_pts = 300000000]
samples           : -1
verbose           : true
Reading in "/home/users/aprokop/data/hacc_497M.arborx" in binary mode...done
Read in 300000000 3D points
-- construction     :      0.241
-- query+cluster    :      2.641
---- neigh          :      0.449
---- query          :      2.098
-- postprocess      :      0.146
total time          :      3.034

#clusters       : 791387
#cluster points : 148503760 [49.50%]
#noise   points : 151496240 [50.50%]
ArborX version    : 1.6 (dev)
ArborX hash       : fd39c111
Kokkos version    : 4.1.0
algorithm         : dbscan
eps               : 0.014000
cluster min size  : 1
implementation    : fdbscan
verify            : false
minpts            : 10
filename          : /home/users/aprokop/data/hacc_497M.arborx [binary, max_pts = 300000000]
samples           : -1
verbose           : true
Reading in "/home/users/aprokop/data/hacc_497M.arborx" in binary mode...done
Read in 300000000 3D points
-- construction     :      0.179
-- query+cluster    :      2.643
---- neigh          :      0.448
---- query          :      2.104
-- postprocess      :      0.146
total time          :      2.973

#clusters       : 791387
#cluster points : 148503760 [49.50%]
#noise   points : 151496240 [50.50%]

aprokop avatar Apr 09 '24 20:04 aprokop

@dalg24 ready for re-review

aprokop avatar Apr 09 '24 23:04 aprokop

I am a bit surprised by the variability of the construction timings in the results you posted. Is it something you have been seeing before?

Something else was running which may have interfered at the moment. I rerun today a few times and saw consistent 0.207-0.210.

aprokop avatar Apr 10 '24 14:04 aprokop