faiss
faiss copied to clipboard
Faiss GPU: improve error information for GPU OOM
Summary:
This diff updates logging in case of GPU out of memory errors, whether from cudaMalloc
directly or from the RAFT allocator. In case of a memory error, allocator state (including an indication of CUDA-reported free memory on the device) is returned as part of the exception message, like this:
C++ exception with description "Error in virtual void *faiss::gpu::StandardGpuResourcesImpl::allocMemory(const faiss::gpu::AllocRequest &) at fbcode/faiss/gpu/StandardGpuResources.cpp:570: StandardGpuResources: Faiss device allocator fail type IVFLists dev 1 space Device stream 0x7fa07623b440 size 1024 bytes Allocator state: GPU device 1 allocator state: ========== Device free memory: 82400968704 bytes Allocator temp memory remaining: 1610612720
Outstanding Faiss allocations:
Alloc type TemporaryMemoryBuffer: 1 allocations, 1610612736 bytes
Alloc type FlatData: 2 allocations, 59648 bytes
In the case where Faiss is built using RAFT, previously no error information was provided if the RAFT memory manager had an OOM error, but now it will produce a string similar to the above. The Faiss memory manager (StandardGpuResources) continues to log all allocations made and passed to the RAFT memory manager, so we can also receive an indication of what is allocated and for what purpose.
In addition, this fixes the issue where Faiss GPU would not compile (in fbcode at least) if the USE_NVIDIA_RAFT
define was not available. Now the library compiles both with and without RAFT.
Also updated the #if defined USE_NVIDIA_RAFT
to #ifdef USE_NVIDIA_RAFT` to better conform to the rest of the GPU code.
This diff also disables the temporary memory allocation of 1.5 GB made up front if RAFT is being used, which is really what is intended for using the RAFT memory manager. Otherwise this diff does not change the runtime behavior of Faiss GPU otherwise, but this diff is being made to better debug GPU OOM issues with Faiss usage.
Reviewed By: mdouze
Differential Revision: D49260364