ArborX
ArborX copied to clipboard
Add a faster implementation of the union-find for Serial
The patch tries to solve the problem of slow Serial
atomics when Kokkos compiled with other backends. This patch is not strictly necessary right now, as one could compile Kokkos with just the Serial
backend. However, it becomes necessary once I introduce a dendrogram construction algorithm using union-find, which will run MST construction on the device, but construct union-find on CPU using a single thread. In that context, it's not possible to fix it by just compiling Kokkos.
Drive-by change: labels_
-> _labels
The following code for 20,000,000 edges went from 0.46 to 0.23 when using Serial backend but compiled with others.
Kokkos::parallel_for(
"Bechmark::union-find",
Kokkos::RangePolicy<ExecutionSpace>(ExecutionSpace{}, 0, num_edges),
KOKKOS_LAMBDA(int e) {
int i = edges(e).source;
int j = edges(e).target;
for (int k : {i, j})
{
auto child = labels(union_find.representative(k));
if (child != UNDEFINED)
parents(child) = e;
}
union_find.merge(i, j);
labels(union_find.representative(i)) = e;
});