faiss icon indicating copy to clipboard operation
faiss copied to clipboard

Same index fits to GPU memory with faiss-gpu 1.6.5 but not with 1.7.4

Open raraz15 opened this issue 1 year ago • 1 comments

Summary

I use NVIDIA GeForce RTX 2080 Ti for the below experiments.

Using faiss-gpu 1.6.5, I was able to store 56M vectors to an IVFPQ index using the code below. The GPU memory was utilized 6903MiB / 11264MiB. We can see that the memory is not even full.

When I updated to 1.7.4 and run the same code on same data. The data can not fit to the memory and I get the following error.

Platform

Linux

1.6.5 is installed from conda 1.7.4 is Installed from: conda install -c pytorch -c nvidia faiss-gpu=1.7.4 mkl=2021 blas=1.0=mkl

Running on:

  • [ ] CPU
  • [x ] GPU

Interface:

  • [ ] C++
  • [x ] Python

Reproduction instructions

` n_centroids = 256 code_sz = 64 # power of 2 nbits = 8 # nbits must be 8, 12 or 16, The dimension d should be a multiple of M.

N, d = train_data_shape

quantizer = faiss.IndexFlatL2(d) index_ivf = faiss.IndexIVFPQ(quantizer, d, n_centroids, code_sz, nbits)

GPU_OPTIONS = faiss.GpuClonerOptions() GPU_OPTIONS.useFloat16 = True # use float16 table to avoid https://github.com/facebookresearch/faiss/issues/1178

GPU_RESOURCES = faiss.StandardGpuResources() # Use a single GPU index_ivf = faiss.index_cpu_to_gpu(GPU_RESOURCES, 0, index_ivf, GPU_OPTIONS)

index_ivf.train(train_data) index_ivf.add(train_data) index_ivf.nprobe = nprobe `

RuntimeError: Error in virtual void* faiss::gpu::StandardGpuResourcesImpl::allocMemory(const faiss::gpu::AllocRequest&) at /home/circleci/miniconda/conda-bld/faiss-pkg_1681998300314/work/faiss/gpu/StandardGpuResources.cpp:452: Error: 'err == cudaSuccess' failed: StandardGpuResources: alloc fail type IVFLists dev 0 space Device stream 0xfcdf480 size 54607872 bytes (cudaMalloc error out of memory [2])

Any idea what might changed so drastically? How can I fit the same data?

raraz15 avatar Dec 05 '23 23:12 raraz15

Update: I could use 1.7.2 without the above issues, so the error should be related to a change after this version.

raraz15 avatar Dec 08 '23 22:12 raraz15