faiss icon indicating copy to clipboard operation
faiss copied to clipboard

the problem in CUDA 11.4 and how to use pip install Faiss in CUDA 11?

Open fanguu opened this issue 2 years ago • 10 comments

Summary

Faiss assertion 'err == CUBLAS_STATUS_SUCCESS' failed in void faiss::gpu::runMatrixMult(faiss::gpu::Tensor<float, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, faiss::gpu::Tensor<IndexType, 2, true>&, bool, float, float, cublasHandle_t, cudaStream_t) [with AT = float; BT = float; cublasHandle_t = cublasContext*; cudaStream_t = CUstream_st*] at /__w/faiss-wheels/faiss-wheels/faiss/faiss/gpu/utils/MatrixMult-inl.cuh:265; details: cublas failed (13): (512, 128) x (60000, 128)' = (512, 60000) gemm params m 60000 n 512 k 128 trA T trB N lda 128 ldb 128 ldc 60000

Platform

OS: ubuntu 20.04, RTX 3090

Faiss version: faiss-gpu-1.7.1.post2

Installed from: <pip python 3.7

Running on:

  • [ ] CPU
  • [ x] GPU

Interface:

  • [ ] C++
  • [ x] Python

Reproduction instructions

fanguu avatar Aug 30 '21 13:08 fanguu

What are you trying to do when it fails?

mdouze avatar Sep 01 '21 16:09 mdouze

@mdouze I met the same problem. It seems the problem appears when I tried to use clustering on GPU. I followed this example 3 from https://www.programcreek.com/python/example/112284/faiss.Clustering. Could you pls take a look? Thanks.

wetliu avatar Dec 23 '21 06:12 wetliu

@mdouze I met the same problem. It seems the problem appears when I tried to use clustering on GPU. I followed this example 3 from https://www.programcreek.com/python/example/112284/faiss.Clustering. Could you pls take a look? Thanks.

I have met the same problem and look forward to your reply!!!

mqwfrog avatar Dec 28 '21 02:12 mqwfrog

Also with a RTX 3090 ?

mdouze avatar Jan 19 '22 11:01 mdouze

RTX A6000

wetliu avatar Jan 19 '22 16:01 wetliu

I ran into the same problem when running on NVIDIA A100. I am using faiss-1.7.1 installed by pip.

Saltychtao avatar Jan 21 '22 02:01 Saltychtao

When I was training the index on 2xRTX3090 gpus using around 10m vectors as train_gpu_script suggests, I ran into the same error as follows:

Faiss assertion 'err == CUBLAS_STATUS_SUCCESS' failed in void faiss::gpu::runMatrixMult(faiss::gpu::Tensor<float, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, faiss::gpu::Tensor<In dexType, 2, true>&, bool, float, float, cublasHandle_t, cudaStream_t) [with AT = float; BT = float; cublasHandle_t = cublasContext*; cudaStream_t = CUstream_st*] at /project/faiss/faiss/gpu/u tils/MatrixMult-inl.cuh:265; details: cublas failed (13): (512, 256) x (262144, 256)' = (512, 262144) gemm params m 262144 n 512 k 256 trA T trB N lda 256 ldb 256 ldc 262144

xuzhangda-patsnap avatar Jan 25 '22 07:01 xuzhangda-patsnap

Any fixes that work?

griff4692 avatar Feb 08 '22 19:02 griff4692

Facing the same problem on A100. Attempts of reducing batch size doesn't seem to help since there are many documents in the index. It looks like an OOM, no problem if I disable --faiss-use-gpu but it runs super slowly.

Faiss assertion 'err == CUBLAS_STATUS_SUCCESS' failed in void faiss::gpu::runMatrixMult(faiss::gpu::Tensor<float, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, faiss::gpu::Tensor<IndexType, 2, true>&, bool, float, float, cublasHandle_t, cudaStream_t) [with AT = __half; BT = __half; cublasHandle_t = cublasContext*; cudaStream_t = CUstream_st*] at /project/faiss/faiss/gpu/utils/MatrixMult-inl.cuh:265; 
details: cublas failed (13): (2, 768) x (524288, 768)' = (2, 524288) gemm params m 524288 n 2 k 768 trA T trB N lda 768 ldb 768 ldc 524288

My installations:

  • torch 1.7.1+cu110
  • faiss-cpu 1.7.2
  • faiss-gpu 1.7.2

Any workaround to this?

memray avatar Mar 25 '22 22:03 memray

Did anyone find a solution for it?

athithya-raj avatar Jul 20 '22 07:07 athithya-raj