faiss
faiss copied to clipboard
faiss::gpu::runMatrixMult ... cublas failed (13): (1024, 12) x (256, 12)' = (1024, 256) gemm params m 256 n 1024 k 12 trA T trB N lda 12 ldb 12 ldc 256
Summary
I'm trying to train an IVFPQ index for 100000 768-dimensional embeddings on an NVIDIA GPU with 40537MiB of memory. The code fails at index.train()
with the following error message:
Faiss assertion 'err == CUBLAS_STATUS_SUCCESS' failed in void faiss::gpu::runMatrixMult(faiss::gpu::Tensor<float, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, faiss::gpu::Tensor<IndexType, 2, true>&, bool, float, float, cublasHandle_t, cudaStream_t) [with AT = float; BT = float; cublasHandle_t = cublasContext*; cudaStream_t = CUstream_st*] at /__w/faiss-wheels/faiss-wheels/faiss/faiss/gpu/utils/MatrixMult-inl.cuh:265; details: cublas failed (13): (1024, 12) x (256, 12)' = (1024, 256) gemm params m 256 n 1024 k 12 trA T trB N lda 12 ldb 12 ldc 256
Aborted (core dumped)
Platform
OS: Ubuntu 20.04
Faiss version: faiss-gpu 1.7.1.post2
Installed from: anaconda (pip install faiss-gpu)
Faiss compilation options: Nothing explicitly
Running on:
- [ ] CPU
- [x] GPU
Interface:
- [ ] C++
- [x] Python
Reproduction instructions
# n = 768, flatK = 100, D = 64, K = 256
res = faiss.StandardGpuResources()
n = train_embeddings.shape[1] # train_embeddings has shape (100000, 768)
quantizer = faiss.IndexFlatL2(n)
index = faiss.IndexIVFPQ(quantizer, n, flatK, D, round(log2(K)))
co = faiss.GpuClonerOptions()
co.useFloat16 = True
index = faiss.index_cpu_to_gpu(res, 2, index, co) # to use GPU2 on a multi-GPU VM
index.train(train_embeddings) # error
What is your CUDA version? If it is >=11.2, have you tried on CUDA 10?
@xzyaoi My CUDA version is 10.1. I also checked if the same code runs correctly with the GPU-specific lines commented out, and it does. I'm still not able to get it to run using the GPU though.
What kind of GPU are you using? 40 GiB makes me think of A100, which really should require CUDA 11?
Hi @wickedfoo My cuda is 11 but still showing the error
The same error, have you solved it now?
I have the same error with RTX3090
I changed the version of faiss gpu and it worked
On Wed, Mar 2, 2022, 08:48 tengteng-Lin @.***> wrote:
I have the same error with RTX3090
— Reply to this email directly, view it on GitHub https://github.com/facebookresearch/faiss/issues/2064#issuecomment-1056115502, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG46Y6NKW2VMSL76Q6ROPDLU53MXXANCNFSM5EUH2UCA . You are receiving this because you commented.Message ID: @.***>
@MotiBaadror Can you tell us what CUDA, faiss, faiss-gpu, etc. versions were when you finally managed to get it to work? Were you using A100 GPUs?
I have the same error with RTX3090, please help me ???
I solved it by changing the version, and the current version is 1.7.2
At 2022-03-21 17:39:36, "zhoujianch" @.***> wrote:
I have the same error with RTX3090, please help me ???
— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you commented.Message ID: @.***>
@tengteng-Lin thanks for your reply.
my faiss vesion are both faiss-cpu=1.7.2 and faiss-gpu=1.7.2, but it still does not work for me.
are you compile library from source?
Seeing this with A100 / CUDA 11.5 / faiss-gpu=1.7.2
Seeing this with A100 / CUDA 11.1 / faiss-gpu=1.7.2. The error occurs at the search step of a flat index.
I am running into the same issue on RTX3090. Ubuntu, Driver 510.73.05; Cuda: 11.6
showing me this error with cuda 11.6 rtx3090 faiss-gpu=1.7.2
Seeing this with cuda 11.1 rtx3090 faiss-gpu=1.7.2
Same error with:
-
faiss-gpu==1.7.2
as well asfaiss-gpu==1.6.5
-
cudatoolkit==11.6.0
andcudatookit==11.3.1
- Quadro RTX 5000
Trying to reinstall from scratch, upgrade or downgrade faiss
did not solve this problem, any hint would be appreciated
Same here: faiss-gpu 1.7.2 cuda 11.6 RTX A5000
Same with faiss-gpu 1.7.2 CUDA 11.7 RTX 3090
Update: The error occurs when I use the faiss-gpu PIP package from https://github.com/kyamagu/faiss-wheels (in Rocky Linux 9 with Python 3.9 and CUDA 11.7). If I use Anaconda3 with Python 3.8 and install the faiss-gpu from pytorch conda repo with cuda 11.3 (which is the officially supported manner), the error no longer appears. Perhaps this should have been an issue in that repo instead.
Update: The error occurs when I use the faiss-gpu PIP package from https://github.com/kyamagu/faiss-wheels (in Rocky Linux 9 with Python 3.9 and CUDA 11.7). If I use Anaconda3 with Python 3.8 and install the faiss-gpu from pytorch conda repo with cuda 11.3 (which is the officially supported manner), the error no longer appears. Perhaps this should have been an issue in that repo instead.
Thank you so much. I also fix this issue on A100 GPU following your suggestion. My environment is python==3.8, cuda==11.3, faiss-gpu==1.7.2, torch==1.9.1+cu111.
- env: python==3.9, cuda==11.4, faiss-gpu==1.7.4 / A100
Met also in the env above, but haven't tried the solution to downgrade the python version.
After downgrading the python version to py38 and follow https://github.com/facebookresearch/faiss/issues/2064#issuecomment-1243177673 said, it works!!!
It helped me to install a specific wheel with faiss-gpu==1.7.3:
pip install https://github.com/kyamagu/faiss-wheels/releases/download/v1.7.3/faiss_gpu-1.7.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl