Add new alpha-tree based dendrogram construction
This patch introduced a new parallel dendrogram construction algorithm. It is quite complex, and not easy to explain. I am happy to provide a draft of the paper I have, or to explain in person.
Couple builds fail because of the desul atomic_min for ints problem we saw in #782. Not worrying about that for now. Will before the final merge.
Added combined Boruvka+alpha algorithm. We need to discuss how to merge this PR.
Running on Summit.
HACC 497M (80M)
Separate dendrogram:
./ArborX_Benchmark_DBSCAN.exe --binary --filename hacc_497M.arborx --algorithm hdbscan --core-min-size 1 --verbose --max-num-points 80000000 --dendrogram alpha
ArborX version : 1.4 (dev)
ArborX hash : 6b50fd1b-dirty
Kokkos version : 4.0.99
algorithm : hdbscan
dendrogram : alpha
minpts : 1
filename : hacc_497M.arborx [binary, max_pts = 80000000]
samples : -1
verbose : true
Reading in "hacc_497M.arborx" in binary mode...done
Read in 80000000 3D points
-- mst : 1.564
-- dendrogram : 0.804
---- edge sort : 0.035
---- alpha edges : 0.147
---- alpha vertices : 0.088
---- alpha matrix : 0.086
---- sided parents : 0.077
---- compression : 0.013
---- parents : 0.174
total time : 2.371
Combined Boruvka + dendrogram:
./ArborX_Benchmark_DBSCAN.exe --binary --filename hacc_497M.arborx --algorithm hdbscan --core-min-size 1 --verbose --max-num-points 80000000 --dendrogram boruvka
ArborX version : 1.4 (dev)
ArborX hash : 6b50fd1b-dirty
Kokkos version : 4.0.99
algorithm : hdbscan
dendrogram : boruvka
minpts : 1
filename : hacc_497M.arborx [binary, max_pts = 80000000]
samples : -1
verbose : true
Reading in "hacc_497M.arborx" in binary mode...done
Read in 80000000 3D points
-- construction : 0.220
-- boruvka : 1.402
---- sided parents : 0.003
---- vertex parents : 0.007
-- edge parents : 0.078
total time : 1.682
normal100M2
Separate dendrogram
./ArborX_Benchmark_DBSCAN.exe --binary --filename normal100M2.arborx --algorithm hdbscan --core-min-size 5 --verbose --dendrogram alpha
ArborX version : 1.4 (dev)
ArborX hash : 6b50fd1b-dirty
Kokkos version : 4.0.99
algorithm : hdbscan
dendrogram : alpha
minpts : 5
filename : normal100M2.arborx [binary, max_pts = -1]
samples : -1
verbose : true
Reading in "normal100M2.arborx" in binary mode...done
Read in 100000000 2D points
-- mst : 1.898
-- dendrogram : 0.941
---- edge sort : 0.044
---- alpha edges : 0.171
---- alpha vertices : 0.219
---- alpha matrix : 0.045
---- sided parents : 0.094
---- compression : 0.008
---- parents : 0.211
total time : 2.841
Combined Boruvka + dendrogram:
./ArborX_Benchmark_DBSCAN.exe --binary --filename normal100M2.arborx --algorithm hdbscan --core-min-size 5 --verbose --dendrogram boruvka
ArborX version : 1.4 (dev)
ArborX hash : 6b50fd1b-dirty
Kokkos version : 4.0.99
algorithm : hdbscan
dendrogram : boruvka
minpts : 5
filename : normal100M2.arborx [binary, max_pts = -1]
samples : -1
verbose : true
Reading in "normal100M2.arborx" in binary mode...done
Read in 100000000 2D points
-- construction : 0.231
-- core distances : 0.507
-- boruvka : 1.295
---- sided parents : 0.003
---- vertex parents : 0.057
-- edge parents : 0.104
total time : 2.099
Summary
Combined Boruvka + dendrogram runs ~7-10% slower than a standalone MST. But the dendrogram part is 5-7x faster than computing dendrogram separately, depending on an example (and like 70x faster than union-find). The main benefit is that we don't have to recalculate any of the alpha edges, alpha vertices, incidence matrices, etc. All that information is already available as part of regular Boruvka iterations.
Converted to draft as I think #845 should be merged first.
Rebased on #845.