[BUG] Cagra graph build copies dataset multiple times

Open ldematte opened this issue 2 months ago • 0 comments

Describe the bug When we supply to cagraIndexBuild (or the equivalent C++ functions) a dataset that resides in device memory (e.g. because it was already copied, e.g. to remove strides as a workaround for #1455, or is the result of other GPU computations like quantization), a new copy of the dataset is created in device memory. This means we need (at least) twice the dataset memory on the device, meaning cagraIndexBuild is likely to fail for bigger datasets.

Steps/Code to reproduce bug

On a X GB GPU, supply to cagraIndexBuild a DLManagedTensor with dataset type kDLCUDA (dataset address on device memory) and size X * 0.6 GB (a bit more than half the ram, e.g. a 13GB dataset on a 24GB GPU). cagraIndexBuild fails with a memory error.

Expected behavior

cagraIndexBuild should provide help/guidance/a way to avoid that:

a way of not copying accepting the performance hit could be fine
instructions on how to shape the data (pre-stridden/padded data) so we can avoid the copy and take a "fast path"

Any way to avoid the issue and allow more data to be fitted and processed on the GPU

Additional context

Another issue that was recently brought to light is the fact that nn-descent wants to copy the whole dataset into float16. That is a separate issue above and beyond the strided/padded dataset issue, but it's another area of concern around usage of GPU memory/multiple dataset copies.

Oct 23 '25 15:10 ldematte