FlashRAG icon indicating copy to clipboard operation
FlashRAG copied to clipboard

加载wiki_100w_e5_index的时候,80G的显存都会显示Error: 'err == cudaSuccess' failed

Open duyuwen-duen opened this issue 9 months ago • 6 comments

RuntimeError: Error in virtual void* faiss::gpu::StandardGpuResourcesImpl::allocMemory(const faiss::gpu::AllocRequest&) at /home/circleci/miniconda/conda-bld/faiss-pkg_1709244517602/work/faiss/gpu/StandardGpuResources.cpp:530: Error: 'err == cudaSuccess' failed: StandardGpuResources: alloc fail type TemporaryMemoryOverflow dev 0 space Device stream 0xbcdba10 size 32279537664 bytes (cudaMalloc error out of memory [2])

duyuwen-duen avatar Apr 06 '25 02:04 duyuwen-duen

如果使用faiss-gpu,尽量使用多卡分摊显存

ignorejjj avatar Apr 06 '25 04:04 ignorejjj

多卡分摊确实能解决,不过有个疑问是,多卡分摊时,我四张卡,每张卡占了10G,但是这样一共才40G,为什么单卡加载却会显示OOM呢(单卡显存是80G)

duyuwen-duen avatar Apr 12 '25 02:04 duyuwen-duen

这个我没有仔细研究过,猜测可能是faiss在分布式索引上做了点优化

ignorejjj avatar Apr 12 '25 04:04 ignorejjj

好滴好滴,感谢~

duyuwen-duen avatar Apr 13 '25 11:04 duyuwen-duen

我想请教一下如何使用faiss-gpu如何进行多卡? 这是我的配置文件: gpu_id: "0,1,2,3,4,5,6,7" 我用了8卡A100(40G),但是还是会报: RuntimeError: Error in virtual void* faiss::gpu::StandardGpuResourcesImpl::allocMemory(const faiss::gpu::AllocRequest&) at /project/faiss/faiss/gpu/StandardGpuResources.cpp:577: Error: 'err == cudaSuccess' failed: StandardGpuResources: alloc fail type FlatData dev 0 space Device stream 0x1b08dafa0 size 4034941440 bytes (cudaMalloc error out of memory [2]) 这个错误 是不是代表并没有真正实现多卡?

xiariyuni avatar Apr 17 '25 03:04 xiariyuni

可能是因为某一张卡的显存爆了

ignorejjj avatar Apr 17 '25 03:04 ignorejjj