faiss icon indicating copy to clipboard operation
faiss copied to clipboard

Faiss GPU return different result than CPU

Open mjack3 opened this issue 2 years ago • 9 comments

Summary

I have a code that return higher performance running FAISS on CPU than GPU. The difference is huge.

Platform

Ubuntu 20.04 lTS

Faiss version: 1.7.2

Installed from: pip install faiss-gpu https://pypi.org/project/faiss-gpu/

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.39.01    Driver Version: 510.39.01    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:0A:00.0  On |                  N/A |
| 36%   29C    P8    39W / 370W |   1095MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1325      G   /usr/lib/xorg/Xorg                102MiB |
|    0   N/A  N/A      1906      G   /usr/lib/xorg/Xorg                523MiB |
|    0   N/A  N/A      2043      G   /usr/bin/gnome-shell               89MiB |
|    0   N/A  N/A      2436      G   ...697963766717379249,131072      361MiB |
+-----------------------------------------------------------------------------+
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Fri_Dec_17_18:16:03_PST_2021
Cuda compilation tools, release 11.6, V11.6.55
Build cuda_11.6.r11.6/compiler.30794723_0
#define CUDNN_MAJOR 8
#define CUDNN_MINOR 3
#define CUDNN_PATCHLEVEL 3

Running on:

  • [X] CPU
  • [X] GPU

Interface:

  • [ ] C++
  • [X] Python

Reproduction instructions

I am using AUROC to meassure the performance of certain code. The "faiss part" of the code looks like this:

import faiss
from faiss.contrib import torch_utils


device = torch.device('cuda'):
bank_embeddings: torch.Tensor = torch.load(os.path.join(embedding_dir, 'bank_embeddings.pt'), map_location=device)
self.index = faiss.IndexFlatL2(bank_embeddings.shape[-1])
if device == torch.device('cuda'):
    provider = faiss.StandardGpuResources()  # use a single GPU
    self.index = faiss.index_cpu_to_gpu(provider, 0, self.index) # make it a flat GPU index
self.index.add(bank_embeddings)

Next, I do my search

D, I = self.index.search(n, 3)

If I use CPU my AUROC performance is 0.9754 If I use the GPU, my AUROC score is 0.4675

There is something wrong? The code is always the same :S

mjack3 avatar Mar 21 '22 08:03 mjack3

No this is not normal. The only possible discrepancy is when there are duplicate vectors in the dataset and their ordering in result lists is random.

mdouze avatar Mar 21 '22 10:03 mdouze

I have tested this:

import faiss
from faiss.contrib import torch_utils

device = torch.device('cuda') # GPU device
# device = torch.device('cpu') # CPU device

bank_embeddings: torch.Tensor = torch.load(os.path.join(embedding_dir, 'bank_embeddings.pt'), map_location=device)
 if device == torch.device('cuda'): # FIXME: the performance in GPU is decreased
       cfg = faiss.GpuIndexFlatConfig()
       resources = faiss.StandardGpuResources()
       self.index = faiss.GpuIndexFlatL2(resources, bank_embeddings.shape[-1], cfg)
else:
       self.index = faiss.IndexFlatL2(bank_embeddings.shape[-1])

self.index.add(bank_embeddings)

Next the same search

if not n.is_contiguous():
       n = n.contiguous() # n shape is: (784, 1536)
D, I = self.index.search(n, 3) # D shape is (784, 3)

D matrix if GPU is used

tensor([[15.5180, 18.6601, 19.2905],
        [18.4031, 21.5452, 22.1756],
        [20.4813, 23.6233, 24.2538],
        ...,
        [23.9322, 27.0743, 27.7047],
        [21.9629, 25.1050, 25.7354],
        [17.7017, 20.8438, 21.4742]], device='cuda:0')

D matrix if CPU is used:

tensor([[4.4300, 6.7436, 6.9746],
        [4.1022, 4.9629, 6.6677],
        [4.2402, 6.0263, 7.2301],
        ...,
        [4.8605, 5.8493, 6.3380],
        [4.7552, 5.8216, 5.8592],
        [4.7130, 6.0770, 6.9383]])

Values are quite differents.

Do you think this could be solve manually?

mjack3 avatar Mar 21 '22 11:03 mjack3

Did you any progress?

mjack3 avatar Mar 22 '22 20:03 mjack3

Could you repro with self-contained code?

mdouze avatar Mar 31 '22 10:03 mdouze

Could you repro with self-contained code?

Sure. Give me until tomorrow :).

Thanks!

mjack3 avatar Mar 31 '22 11:03 mjack3

Hello @mdouze

I updated a self-contained code. There you just need to execute the main.py for running the tests. Let me know if you can access properly.

https://github.com/mjack3/FAISS_GPU_TEST

Note: I am using the torch+CUDA==11.3 although requeriments.txt has the torch+CUDA==10.2. I think that won't make difference.

Thank you so much

mjack3 avatar Apr 01 '22 08:04 mjack3

Running your test on Faiss 1.7.2 and torch 1.10.1 yields https://gist.github.com/mdouze/df42bd8ad8c74c42fedd3fd1043e83b7

NB that the Cuda version does matter and that pip is not a supported way of installing Faiss (use conda).

mdouze avatar Apr 04 '22 07:04 mdouze

pip is not a supported way of installing sure there is the problem. I have my project environment configured with pip. Do you plan to support pip ?

mjack3 avatar Apr 04 '22 08:04 mjack3

For my cace, I was experiencing the cpu-gpu difference issue with faiss-gpu==1.7.2 that I installed with pip. After uninstalling it, and reintalling faiss-gpu by conda using conda install -c pytorch faiss-gpu==1.7.3 the issue has been resolved.

roomo7time avatar Dec 26 '23 19:12 roomo7time