[TASK] Reuse `all_neighbors` APIs in CAGRA ACE build
The CAGRA augmented core extraction (ACE) build method introduced in PR #1404 supports building CAGRA indices on very large datasets that exceed GPU memory capacity. To this end, it partitions the dataset similar to the batched all_neighbors approach. This issue tracks the overlap and potential integration into ACE to minimize code duplication.
ace_get_partition_labels: This is similar to running the all_neighbors get_centroids_on_data_subsample and assign_clusters routines (see https://github.com/rapidsai/cuvs/pull/1404#discussion_r2411808609).
-
get_centroids_on_data_subsampleruns balanced k-means on a subsample of the dataset to get centroids. This uses balancedkmeans::fit, which only supports the datatypefloatandint8_t. Centroids have to be of typefloat. The all neighbors implementation uses a generic template for both. This would force typefloatfor both. We could make the centroids type explicit and convert the subsampled dataset to typefloatif other types are provided. What do you think @jinsol? Another difference is the number of samples, which we might need to add as an additional parameter. -
assign_clustersassign each data point to topoverlap_factor(2 for ACE) number of clusters. It usesbrute_force::searchwhich expectsfloatorhalf. However, the main issue is that theglobal_nearest_cluster(partition labels in ACE notation) are expected as a matrix view of index type. This would beint64_tto match the expected extend type and we end up with twice the memory requirements. We could useint64_tduring batchedbrute_force::searchand then convert to a 32-bit index type. This would also require changes in the all neighbors implementation.
ace_create_forward_and_backward_lists has some overlap with the all_neighbors get_inverted_indices (see https://github.com/rapidsai/cuvs/pull/1404#discussion_r2411915915). After analyzing the overlap, I believe this routine differs significantly since it forms a single cluster from the overlap_factor = 2 clusters. ACE needs the primary and augmented partition independent though. We also have to separate ACE's in-memory and disk path and require a forward mapping. I think unifying these is not desirable.
Thanks @julianmi for leaving this issue.
For 1. It seems like ACE also uses balanced kmeans and currently converts the sampled data to float types so I think that would be fine!
Marking another discussion: can we reuse all_neighbors::gpu_batched_build for the in memory path https://github.com/rapidsai/cuvs/pull/1404/files#r2418194555