faiss::gpu::runMatrixMult ... cublas failed (13): (1024, 12) x (256, 12)' = (1024, 256) gemm params m 256 n 1024 k 12 trA T trB N lda 12 ldb 12 ldc 256

Open anirudhajith opened this issue 2 years ago • 23 comments

Summary

I'm trying to train an IVFPQ index for 100000 768-dimensional embeddings on an NVIDIA GPU with 40537MiB of memory. The code fails at index.train() with the following error message:

Faiss assertion 'err == CUBLAS_STATUS_SUCCESS' failed in void faiss::gpu::runMatrixMult(faiss::gpu::Tensor<float, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, faiss::gpu::Tensor<IndexType, 2, true>&, bool, float, float, cublasHandle_t, cudaStream_t) [with AT = float; BT = float; cublasHandle_t = cublasContext*; cudaStream_t = CUstream_st*] at /__w/faiss-wheels/faiss-wheels/faiss/faiss/gpu/utils/MatrixMult-inl.cuh:265; details: cublas failed (13): (1024, 12) x (256, 12)' = (1024, 256) gemm params m 256 n 1024 k 12 trA T trB N lda 12 ldb 12 ldc 256
Aborted (core dumped)

Platform

OS: Ubuntu 20.04

Faiss version: faiss-gpu 1.7.1.post2

Installed from: anaconda (pip install faiss-gpu)

Faiss compilation options: Nothing explicitly

Running on:

[ ] CPU
[x] GPU

Interface:

[ ] C++
[x] Python

Reproduction instructions

# n = 768, flatK = 100, D = 64, K = 256
res = faiss.StandardGpuResources()
n = train_embeddings.shape[1]    # train_embeddings has shape (100000, 768)
quantizer = faiss.IndexFlatL2(n)
index = faiss.IndexIVFPQ(quantizer, n, flatK, D, round(log2(K)))
co = faiss.GpuClonerOptions()
co.useFloat16 = True
index = faiss.index_cpu_to_gpu(res, 2, index, co)    # to use GPU2 on a multi-GPU VM
index.train(train_embeddings)                        # error

Sep 23 '21 18:09 anirudhajith

What is your CUDA version? If it is >=11.2, have you tried on CUDA 10?

Sep 24 '21 08:09 xzyaoi

@xzyaoi My CUDA version is 10.1. I also checked if the same code runs correctly with the GPU-specific lines commented out, and it does. I'm still not able to get it to run using the GPU though.

Sep 24 '21 08:09 anirudhajith

What kind of GPU are you using? 40 GiB makes me think of A100, which really should require CUDA 11?

Oct 07 '21 18:10 wickedfoo

Hi @wickedfoo My cuda is 11 but still showing the error

Dec 23 '21 06:12 MotiBaadror

The same error, have you solved it now?

Jan 24 '22 05:01 monchin

I have the same error with RTX3090

Mar 02 '22 03:03 tengteng-Lin

I changed the version of faiss gpu and it worked

On Wed, Mar 2, 2022, 08:48 tengteng-Lin @.***> wrote:

I have the same error with RTX3090

— Reply to this email directly, view it on GitHub https://github.com/facebookresearch/faiss/issues/2064#issuecomment-1056115502, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG46Y6NKW2VMSL76Q6ROPDLU53MXXANCNFSM5EUH2UCA . You are receiving this because you commented.Message ID: @.***>

Mar 02 '22 15:03 MotiBaadror

@MotiBaadror Can you tell us what CUDA, faiss, faiss-gpu, etc. versions were when you finally managed to get it to work? Were you using A100 GPUs?

Mar 02 '22 16:03 anirudhajith

I have the same error with RTX3090, please help me ???

Mar 21 '22 09:03 zhoujianch

I solved it by changing the version, and the current version is 1.7.2

At 2022-03-21 17:39:36, "zhoujianch" @.***> wrote:

I have the same error with RTX3090, please help me ???

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you commented.Message ID: @.***>

Mar 22 '22 08:03 tengteng-Lin

@tengteng-Lin thanks for your reply.
my faiss vesion are both faiss-cpu=1.7.2 and faiss-gpu=1.7.2, but it still does not work for me. are you compile library from source?

Mar 22 '22 08:03 zhoujianch

Seeing this with A100 / CUDA 11.5 / faiss-gpu=1.7.2

Mar 30 '22 22:03 TevenLeScao

Seeing this with A100 / CUDA 11.1 / faiss-gpu=1.7.2. The error occurs at the search step of a flat index.

May 10 '22 09:05 Yu-Shi

I am running into the same issue on RTX3090. Ubuntu, Driver 510.73.05; Cuda: 11.6

Jun 20 '22 07:06 F0rt1s

showing me this error with cuda 11.6 rtx3090 faiss-gpu=1.7.2

Jun 24 '22 18:06 ghost

Seeing this with cuda 11.1 rtx3090 faiss-gpu=1.7.2

Jul 21 '22 05:07 AlexGreason

Same error with:

faiss-gpu==1.7.2 as well as faiss-gpu==1.6.5
cudatoolkit==11.6.0 and cudatookit==11.3.1
Quadro RTX 5000

Trying to reinstall from scratch, upgrade or downgrade faiss did not solve this problem, any hint would be appreciated

Jul 30 '22 16:07 vlievin

Same here: faiss-gpu 1.7.2 cuda 11.6 RTX A5000

Aug 16 '22 22:08 future-xy

Same with faiss-gpu 1.7.2 CUDA 11.7 RTX 3090

Sep 05 '22 05:09 zhangxiangxiao

Update: The error occurs when I use the faiss-gpu PIP package from https://github.com/kyamagu/faiss-wheels (in Rocky Linux 9 with Python 3.9 and CUDA 11.7). If I use Anaconda3 with Python 3.8 and install the faiss-gpu from pytorch conda repo with cuda 11.3 (which is the officially supported manner), the error no longer appears. Perhaps this should have been an issue in that repo instead.

Sep 12 '22 03:09 zhangxiangxiao

Update: The error occurs when I use the faiss-gpu PIP package from https://github.com/kyamagu/faiss-wheels (in Rocky Linux 9 with Python 3.9 and CUDA 11.7). If I use Anaconda3 with Python 3.8 and install the faiss-gpu from pytorch conda repo with cuda 11.3 (which is the officially supported manner), the error no longer appears. Perhaps this should have been an issue in that repo instead.

Thank you so much. I also fix this issue on A100 GPU following your suggestion. My environment is python==3.8, cuda==11.3, faiss-gpu==1.7.2, torch==1.9.1+cu111.

Nov 07 '22 07:11 Victorwz

env: python==3.9, cuda==11.4, faiss-gpu==1.7.4 / A100

Met also in the env above, but haven't tried the solution to downgrade the python version.

After downgrading the python version to py38 and follow https://github.com/facebookresearch/faiss/issues/2064#issuecomment-1243177673 said, it works!!!

Jul 19 '23 21:07 Kin-Zhang

It helped me to install a specific wheel with faiss-gpu==1.7.3:

pip install https://github.com/kyamagu/faiss-wheels/releases/download/v1.7.3/faiss_gpu-1.7.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Dec 21 '23 15:12 vikmary

faiss faiss copied to clipboard

faiss::gpu::runMatrixMult ... cublas failed (13): (1024, 12) x (256, 12)' = (1024, 256) gemm params m 256 n 1024 k 12 trA T trB N lda 12 ldb 12 ldc 256

Summary

Platform

Reproduction instructions

faiss
faiss copied to clipboard