faiss icon indicating copy to clipboard operation
faiss copied to clipboard

faiss::gpu::runMatrixMult ... cublas failed (13): (1024, 12) x (256, 12)' = (1024, 256) gemm params m 256 n 1024 k 12 trA T trB N lda 12 ldb 12 ldc 256

Open anirudhajith opened this issue 2 years ago • 23 comments

Summary

I'm trying to train an IVFPQ index for 100000 768-dimensional embeddings on an NVIDIA GPU with 40537MiB of memory. The code fails at index.train() with the following error message:

Faiss assertion 'err == CUBLAS_STATUS_SUCCESS' failed in void faiss::gpu::runMatrixMult(faiss::gpu::Tensor<float, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, faiss::gpu::Tensor<IndexType, 2, true>&, bool, float, float, cublasHandle_t, cudaStream_t) [with AT = float; BT = float; cublasHandle_t = cublasContext*; cudaStream_t = CUstream_st*] at /__w/faiss-wheels/faiss-wheels/faiss/faiss/gpu/utils/MatrixMult-inl.cuh:265; details: cublas failed (13): (1024, 12) x (256, 12)' = (1024, 256) gemm params m 256 n 1024 k 12 trA T trB N lda 12 ldb 12 ldc 256
Aborted (core dumped)

Platform

OS: Ubuntu 20.04

Faiss version: faiss-gpu 1.7.1.post2

Installed from: anaconda (pip install faiss-gpu)

Faiss compilation options: Nothing explicitly

Running on:

  • [ ] CPU
  • [x] GPU

Interface:

  • [ ] C++
  • [x] Python

Reproduction instructions

# n = 768, flatK = 100, D = 64, K = 256
res = faiss.StandardGpuResources()
n = train_embeddings.shape[1]    # train_embeddings has shape (100000, 768)
quantizer = faiss.IndexFlatL2(n)
index = faiss.IndexIVFPQ(quantizer, n, flatK, D, round(log2(K)))
co = faiss.GpuClonerOptions()
co.useFloat16 = True
index = faiss.index_cpu_to_gpu(res, 2, index, co)    # to use GPU2 on a multi-GPU VM
index.train(train_embeddings)                        # error

anirudhajith avatar Sep 23 '21 18:09 anirudhajith

What is your CUDA version? If it is >=11.2, have you tried on CUDA 10?

xzyaoi avatar Sep 24 '21 08:09 xzyaoi

@xzyaoi My CUDA version is 10.1. I also checked if the same code runs correctly with the GPU-specific lines commented out, and it does. I'm still not able to get it to run using the GPU though.

anirudhajith avatar Sep 24 '21 08:09 anirudhajith

What kind of GPU are you using? 40 GiB makes me think of A100, which really should require CUDA 11?

wickedfoo avatar Oct 07 '21 18:10 wickedfoo

Hi @wickedfoo My cuda is 11 but still showing the error

MotiBaadror avatar Dec 23 '21 06:12 MotiBaadror

The same error, have you solved it now?

monchin avatar Jan 24 '22 05:01 monchin

I have the same error with RTX3090

tengteng-Lin avatar Mar 02 '22 03:03 tengteng-Lin

I changed the version of faiss gpu and it worked

On Wed, Mar 2, 2022, 08:48 tengteng-Lin @.***> wrote:

I have the same error with RTX3090

— Reply to this email directly, view it on GitHub https://github.com/facebookresearch/faiss/issues/2064#issuecomment-1056115502, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG46Y6NKW2VMSL76Q6ROPDLU53MXXANCNFSM5EUH2UCA . You are receiving this because you commented.Message ID: @.***>

MotiBaadror avatar Mar 02 '22 15:03 MotiBaadror

@MotiBaadror Can you tell us what CUDA, faiss, faiss-gpu, etc. versions were when you finally managed to get it to work? Were you using A100 GPUs?

anirudhajith avatar Mar 02 '22 16:03 anirudhajith

I have the same error with RTX3090, please help me ???

zhoujianch avatar Mar 21 '22 09:03 zhoujianch

I solved it by changing the version, and the current version is 1.7.2

At 2022-03-21 17:39:36, "zhoujianch" @.***> wrote:

I have the same error with RTX3090, please help me ???

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you commented.Message ID: @.***>

tengteng-Lin avatar Mar 22 '22 08:03 tengteng-Lin

@tengteng-Lin thanks for your reply.
my faiss vesion are both faiss-cpu=1.7.2 and faiss-gpu=1.7.2, but it still does not work for me. are you compile library from source?

zhoujianch avatar Mar 22 '22 08:03 zhoujianch

Seeing this with A100 / CUDA 11.5 / faiss-gpu=1.7.2

TevenLeScao avatar Mar 30 '22 22:03 TevenLeScao

Seeing this with A100 / CUDA 11.1 / faiss-gpu=1.7.2. The error occurs at the search step of a flat index.

Yu-Shi avatar May 10 '22 09:05 Yu-Shi

I am running into the same issue on RTX3090. Ubuntu, Driver 510.73.05; Cuda: 11.6

F0rt1s avatar Jun 20 '22 07:06 F0rt1s

showing me this error with cuda 11.6 rtx3090 faiss-gpu=1.7.2

ghost avatar Jun 24 '22 18:06 ghost

Seeing this with cuda 11.1 rtx3090 faiss-gpu=1.7.2

AlexGreason avatar Jul 21 '22 05:07 AlexGreason

Same error with:

  • faiss-gpu==1.7.2 as well as faiss-gpu==1.6.5
  • cudatoolkit==11.6.0 and cudatookit==11.3.1
  • Quadro RTX 5000

Trying to reinstall from scratch, upgrade or downgrade faiss did not solve this problem, any hint would be appreciated

vlievin avatar Jul 30 '22 16:07 vlievin

Same here: faiss-gpu 1.7.2 cuda 11.6 RTX A5000

future-xy avatar Aug 16 '22 22:08 future-xy

Same with faiss-gpu 1.7.2 CUDA 11.7 RTX 3090

zhangxiangxiao avatar Sep 05 '22 05:09 zhangxiangxiao

Update: The error occurs when I use the faiss-gpu PIP package from https://github.com/kyamagu/faiss-wheels (in Rocky Linux 9 with Python 3.9 and CUDA 11.7). If I use Anaconda3 with Python 3.8 and install the faiss-gpu from pytorch conda repo with cuda 11.3 (which is the officially supported manner), the error no longer appears. Perhaps this should have been an issue in that repo instead.

zhangxiangxiao avatar Sep 12 '22 03:09 zhangxiangxiao

Update: The error occurs when I use the faiss-gpu PIP package from https://github.com/kyamagu/faiss-wheels (in Rocky Linux 9 with Python 3.9 and CUDA 11.7). If I use Anaconda3 with Python 3.8 and install the faiss-gpu from pytorch conda repo with cuda 11.3 (which is the officially supported manner), the error no longer appears. Perhaps this should have been an issue in that repo instead.

Thank you so much. I also fix this issue on A100 GPU following your suggestion. My environment is python==3.8, cuda==11.3, faiss-gpu==1.7.2, torch==1.9.1+cu111.

Victorwz avatar Nov 07 '22 07:11 Victorwz

  • env: python==3.9, cuda==11.4, faiss-gpu==1.7.4 / A100

Met also in the env above, but haven't tried the solution to downgrade the python version.


After downgrading the python version to py38 and follow https://github.com/facebookresearch/faiss/issues/2064#issuecomment-1243177673 said, it works!!!

Kin-Zhang avatar Jul 19 '23 21:07 Kin-Zhang

It helped me to install a specific wheel with faiss-gpu==1.7.3:

pip install https://github.com/kyamagu/faiss-wheels/releases/download/v1.7.3/faiss_gpu-1.7.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

vikmary avatar Dec 21 '23 15:12 vikmary