faiss icon indicating copy to clipboard operation
faiss copied to clipboard

Multi-GPU OOM on 1.7.4 release

Open fferroni opened this issue 1 year ago • 5 comments

Summary

Using new 1.7.4 compiled from source, getting OOM on GPUs with total 240GB memory where previously on 1.7.2 it was not the case (using only 128GB, but a previous generation).

Platform

OS: Ubuntu 20.04-22.04 Faiss version: v.1.7.4 release Installed from: compiled from source Faiss compilation options: cmake -B build . -DFAISS_ENABLE_GPU=ON -DFAISS_ENABLE_PYTHON=ON -DBUILD_SHARED_LIBS=OFF -DCMAKE_BUILD_TYPE=Release -DFAISS_OPT_LEVEL=avx2 -DCMAKE_CUDA_ARCHITECTURES="60;70;72;75;80;86" In a container nvidia/cuda:11.7.1-cudnn8-devel-ubuntu20.04 that also has libopenblas-dev and python3.8.

Running on:

  • [ ] CPU
  • [x ] GPU

Interface:

  • [ ] C++
  • [x ] Python

Reproduction instructions

Using the new release 1.7.4 versus an older version (1.7.2), I am observing a higher consumption of memory for GPU indexing. Previously, I could fit ~130m vectors of 1024 float with a IVF20000,SQ4 quantizer on 8x V100 16GB GPUs (128GB total). With the new release, using 3x A100 80GB (240GB) I get GPU OOM. I am using the same configuration.

cloner_options = faiss.GpuMultipleClonerOptions()
cloner_options.shard = True
index = faiss.index_cpu_to_all_gpus(trained, cloner_options, ngpu=nb_gpus)

Have some defaults changed in 1.7.4 versus 1.7.2 in the cloner options perhaps, that allocate more memory? Based on the amount of vectors and code size, would have expected around 71GiB necessary. The data itself has not changed nor the pre-trained index weights.

Thank you.

fferroni avatar Aug 01 '23 09:08 fferroni

@wickedfoo any idea why the memory usage between 1.7.2 and 1.7.4 would change?

mdouze avatar Aug 03 '23 08:08 mdouze

I tried another experiment, this time with both versions always with 8x V100 16GB GPUs and same identical data and training file.

1.7.2 (from here) Screenshot 2023-08-08 at 15 53 13

1.7.4 (compiled from source as described above) Screenshot 2023-08-08 at 15 57 04 Error is

RuntimeError: Error in virtual void* faiss::gpu::StandardGpuResourcesImpl::allocMemory(const faiss::gpu::AllocRequest&) at /faiss-1.7.4/faiss/gpu/StandardGpuResources.cpp:452: Error: 'err == cudaSuccess' failed: StandardGpuResources: alloc fail type IVFLists dev 0 space Device stream 0x5654faae4600 size 501432320 bytes (cudaMalloc error out of memory [2])

Admittedly, even the 1.7.2 version was quite close to the limit. However, going to 3x A100 with 240GB doesn't seem to help either.

fferroni avatar Aug 08 '23 14:08 fferroni

Hi @mdouze @wickedfoo I also compiled 1.7.2 using the same script and options as 1.7.4 above (rather than relying on the pip package which is unsupported and has a problem with A100 as per issue).

I can observe that on 1.7.2 it's working fine also on 4x A100 80GB GPUs, they are all sitting comfortably at 27GB each (total 108GB) which is similar to the amount using 8x V100 16GB (total ~116GB).

Screenshot 2023-08-10 at 11 43 45

So indeed, something changed between 1.7.2 and 1.7.4. It's not a GPU architecture issue or differences in compilation.

fferroni avatar Aug 10 '23 10:08 fferroni

Hello again, I also compiled 1.7.3. No issues there either, so the difference must be between 1.7.3 and 1.7.4 Hope that narrows things down further.

1.7.3 Screenshot 2023-08-10 at 13 39 47

fferroni avatar Aug 10 '23 12:08 fferroni

I observe a similar trend between 1.6.5 and 1.7.4.

https://github.com/facebookresearch/faiss/issues/3160

raraz15 avatar Dec 05 '23 23:12 raraz15