cuvs [QST] How to predict the memory required to build an index with CagraIndex using IVFPQ

What is your question? I am trying to build a CagraIndex(intermediate graph using IVFPQ) with 10M 768D documents. Anytime I am trying to build the index I am getting OOM exceptions.

My Machine:

g5.2xlarge
CPU RAM: 32GB
GPU memory: 24GB
Using Faiss python bindings to build the index. I have validated that Faiss is using the Cuvs and not raft.

I would like to know how I can predict the machine size required to build the index?

Jan 13 '25 06:01 navneet1v

@tfeher this could potentially be the same issue you are running into. I wonder now if something didn't get ported from RAFT properly, since FWIR, @navneet1v was able to do this with FAISS+RAFT during his PoC.

cc @divyegala we should get to the bottom of this behavior and fix this for our OpenSearch friends.

Jan 16 '25 00:01 cjnolet

@cjnolet any updates on this? @divyegala , @tfeher

Feb 10 '25 20:02 navneet1v

@divyegala can you confirm this is fixed with your recent FAISS PR? @navneet1v are you able to provide a few lines of the Python code you are using to build the CAGRA index (ideally including the line of code where it fails)

Feb 11 '25 16:02 cjnolet

@cjnolet please find the sample code here:

Creating the dataset. code: https://github.com/navneet1v/VectorSearchForge/blob/main/custom-faiss-installed-image/create-dataset.py

python create-dataset.py

Running the index build. https://github.com/navneet1v/VectorSearchForge/blob/main/custom-faiss-installed-image/faiss-test.py

python create-dataset.py faiss-test.py

This is error I get when I try to build a 7M 768D dataset on a g5.2xlarge machine which has 24GB of GPU memory and 32GB of CPU RAM.

appuser@d254f04ba6e0:/tmp$ python faiss-test.py 
file is written, loading file now..
(7000000, 768)
Creating GPU Index.. with IVF_PQ
using ivf_pq::index_params nrows 7000000, dim 768, n_lits 2645, pq_dim 32
Traceback (most recent call last):
  File "/tmp/faiss-test.py", line 99, in <module>
    #ids = [i for i in range(len(xb))]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/faiss-test.py", line 47, in indexData
    cagraIVFPQIndex = faiss.GpuIndexCagra(res, d, metric, cagraIndexConfig)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/faiss-test.py", line 53, in indexDataInIndex
    def indexDataInIndex(index: faiss.Index, ids, xb):
            ^^^^^^^^^^^^^^^
  File "/tmp/faiss/build/faiss/python/faiss/class_wrappers.py", line 298, in replacement_train
    self.train_c(n, swig_ptr(x))
  File "/tmp/faiss/build/faiss/python/faiss/swigfaiss.py", line 11556, in train
    return _swigfaiss.IndexIDMap_train(self, n, x)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
MemoryError: std::bad_alloc

Feb 11 '25 19:02 navneet1v

@navneet1v can you try building on top of this PR and setting store_dataset = false?

Feb 12 '25 01:02 divyegala

@divyegala I did the build on top of your PR. I found out that issue was because of the https://github.com/navneet1v/VectorSearchForge/blob/main/custom-faiss-installed-image/faiss-test.py#L37

where I was setting cagraIndexIVFPQConfig.kmeans_trainset_fraction = 10 rather than a decimal value thinking 10 means use 10% of data for k-means training. I was using this number from day 1 when we had the initial conversation. Never mind, after fixing this value to a lower value like 0.3 I am able to build the index.

@divyegala can we go ahead in merging the PR on the faiss side.

Feb 18 '25 17:02 navneet1v

@navneet1v PR is merged on FAISS side now https://github.com/facebookresearch/faiss/pull/4173#issuecomment-2666681664

Feb 18 '25 19:02 divyegala

@divyegala and @cjnolet I would like to understand now the behavior for k-means training. Seems like if we are not attaching dataset to index, the number of vectors provided for k-means training is becoming scaling factor. In my testing I am able to see this:

If GPU memory is 24GB, then you cannot go above 13GB of raw vector. Can you please share some guidance on this why something like this is happening?

Feb 18 '25 20:02 navneet1v

If GPU memory is 24GB, then you cannot go above 13GB of raw vector. Can you please share some guidance on this why something like this is happening?

@cjnolet, @divyegala can you provide some info here.

Apr 02 '25 06:04 navneet1v

Looping in @tfeher @achirkin for their insights here

Apr 02 '25 18:04 divyegala

Hi @navneet1v, I could run CAGRA index creation using the IVF-PQ algorithm through the FAISS Python API. When using a memory mapped dataset with shape 1M x 1536, (6.144 GB), then I see a peak GPU memory usage below 3.5 GB.

When I use 5M vectors (1536 dim, 30.72 GB), then it peaked at 5.3 GB

This data was collected on an A30 GPU with 24 GiB memory.

Note that this was run without customizing the IVF-PQ build options for cagra, because this assignment did not work in my build (head cuvs branch-25.06 (b8012181) and faiss main (70c45378e)).

To narrow down where the extra allocations are originating in your tests, could you run your script with RMM logging enabled, and share the log file?

import rmm
rmm.reinitialize(logging=True, log_file_name='/tmp/rmm_log.csv')

Apr 25 '25 12:04 tfeher

@tfeher

Here is the results of the rmm logs, with using mmap to load the vectors:

rmm_log_mmap.csv

Here is the results of the rmm logs, without using mmap to load the vectors:

rmm_log_no_mmap.csv

The data comes from the 1536 dimension 1M vector dataset.

I will generate the graphs from these files in the next few hours, and will post them to this issue.

Apr 25 '25 20:04 rchitale7

@tfeher

Graph with using mmap:

Graph without using mmap:

The memory usage stats don't match what I see when i use nvidia-smi; these graphs look exactly the same. Is it possible that rmm logging isn't capturing all of the GPU memory allocations? When i run nvidia-smi in parallel in a separate terminal window, it shows the GPU memory steadily increasing until it hits ~10.5 GB, before rmm logs anything to the csv file (this is when i don't use mmap)

For additional context: the reason why I thought there was a relationship between the GPU memory usage and the vector loading method (mmap v.s. no mmap) in the first place was that I was trying to test the 10M 768D dataset on a g5.2xlarge instance, and the instance kept crashing. Before the instance crashed, I always saw the GPU memory reported by nvidia-smi steadily increase until it hit the 24 GB GPU memory limit. Then, I saw that @navneet1v had an example that used mmap to load the vectors, so I changed my code to do the same. And after that, the GPU memory reported by nvidia-smi never came close to hitting 24 GB, and the instance did not crash.

Apr 25 '25 22:04 rchitale7

Thank you for posting the graphs!

Is it possible that rmm logging isn't capturing all of the GPU memory allocations?

Yes, it only logs allocations done through RMM API. Which shall include all the allocations done in cuVS, and also any regular device allocation in FAISS (when compiled with cuvs).

I think these logs are already useful in establishing that cuVS works as expected. We still need to understand the complete picture for the memory usage of the faiss integrated cuvs tests. Let's discuss this further on https://github.com/facebookresearch/faiss/issues/4274

When i run nvidia-smi in parallel in a separate terminal window, it shows the GPU memory steadily increasing until it hits ~10.5 GB, before rmm logs anything to the csv file.

I did not observe that.

Apr 28 '25 22:04 tfeher