faiss icon indicating copy to clipboard operation
faiss copied to clipboard

cudaMalloc error out of memor Error

Open yangsp5 opened this issue 2 years ago • 4 comments

Summary

Platform

OS: Linux Ubuntu 18.04

Faiss version: faiss 1.7.2

Installed from: conda install faiss-gpu

Faiss compilation options:

Running on:

  • [ ] CPU
  • [√] GPU

Interface:

  • [ ] C++
  • [ √] Python

Reproduction instructions

  • Error info:
Error in virtual void* faiss::gpu::StandardGpuResourcesImpl::allocMemory(const faiss::gpu::AllocRequest&) at /home/conda/feedstock_root/build_artifacts/faiss-split_1663108094389/work/faiss/gpu/StandardGpuResources.cpp:452: Error: 'err == cudaSuccess' failed: StandardGpuResources: alloc fail type FlatData dev 0 space Device stream 0x55dfe2df9eb0 size 42466784256 bytes (cudaMalloc error out of memory [2])
  • It seem that GPU out of memory. BUT!!!! I use a A100, and it has 80GB memory, when use 42GB, this error happen. I don't know why, please help me.

  • nvidia-smi:

Mon Oct  3 08:30:34 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.129.06   Driver Version: 470.129.06   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-SXM...  On   | 00000000:CF:00.0 Off |                    0 |
| N/A   31C    P0    74W / 400W |  42174MiB / 81251MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

yangsp5 avatar Oct 03 '22 08:10 yangsp5

Could you provide more details about what methods you are using from faiss-gpu? A minimal working example would be useful so we can guide you pinpoint the issue @yangsp5 , thanks!

mlomeli1 avatar Oct 04 '22 20:10 mlomeli1

the code:

import faiss
import numpy as np


dim = 768
num_threads = 10
use_gpu = True

faiss.omp_set_num_threads(num_threads)
index = faiss.IndexFlatIP(dim)
index = faiss.IndexIDMap(index) 
if use_gpu:
    index = faiss.index_cpu_to_all_gpus(index)


# insert into faiss with ids
for fp in filepaths:
    data = torch.load(fp)
    for embeddings, ids in data:
        # embeddings.shape  ---> [256000, 768]
        # ids ---> [1, 2, 3, 4, ....., 256000]
        index.add_with_ids(embeddings, np.array(ids))

yangsp5 avatar Oct 08 '22 02:10 yangsp5

This is due to the geometric doubling behavior in faiss::gpu::DeviceVector's append, which happens when you call add on an index which already has data in it:

https://github.com/facebookresearch/faiss/blob/main/faiss/gpu/utils/DeviceVector.cuh#L94

This is something that we will fix so above a certain size of allocation it will no longer double but instead only increase by a much smaller factor, or even be exactly sized.

In the meantime, even though this is a very large amount of data (more than 40 GB), it might be possible to avoid the issue by loading all data on the CPU and only calling add_with_ids at once. This is something you might be able to do, but I understand the constraints.

Also, the Faiss indices allow direct usage of Torch tensors if you import faiss.contrib.torch_utils, so then you can pass Torch tensors directly.

wickedfoo avatar Oct 10 '22 22:10 wickedfoo

thanks

yangsp5 avatar Oct 18 '22 07:10 yangsp5