cuvs icon indicating copy to clipboard operation
cuvs copied to clipboard

[FEA] Enable NN Descent for larger data dimensions

Open jinsolp opened this issue 1 year ago • 0 comments

Description

Currently, GNND::build() has a preprocess_data_kernel, which requires shared memory size of;

sizeof(Data_t) * ceildiv(build_config_.dataset_dim, static_cast<size_t>(raft::warp_size())) * raft::warp_size()

Considering sizeof(Data_t) = 4 for standard cases such as fp32, and considering a shared memory size of 48KB, this restricts the dataset dimension to be smaller than 12000.

It would be nice to have such features for datasets with large dimensions (> 20000), which is the case for single cell RNA datasets (e.g. datasets here).

jinsolp avatar Aug 14 '24 00:08 jinsolp