Summary

I have a code that return higher performance running FAISS on CPU than GPU. The difference is huge.

Platform

Ubuntu 20.04 lTS

Faiss version: 1.7.2

Installed from: pip install faiss-gpu https://pypi.org/project/faiss-gpu/

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.39.01    Driver Version: 510.39.01    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:0A:00.0  On |                  N/A |
| 36%   29C    P8    39W / 370W |   1095MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1325      G   /usr/lib/xorg/Xorg                102MiB |
|    0   N/A  N/A      1906      G   /usr/lib/xorg/Xorg                523MiB |
|    0   N/A  N/A      2043      G   /usr/bin/gnome-shell               89MiB |
|    0   N/A  N/A      2436      G   ...697963766717379249,131072      361MiB |
+-----------------------------------------------------------------------------+

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Fri_Dec_17_18:16:03_PST_2021
Cuda compilation tools, release 11.6, V11.6.55
Build cuda_11.6.r11.6/compiler.30794723_0

#define CUDNN_MAJOR 8
#define CUDNN_MINOR 3
#define CUDNN_PATCHLEVEL 3

Running on:

[X] CPU
[X] GPU

Interface:

[ ] C++
[X] Python

Reproduction instructions

I am using AUROC to meassure the performance of certain code. The "faiss part" of the code looks like this:

import faiss
from faiss.contrib import torch_utils


device = torch.device('cuda'):
bank_embeddings: torch.Tensor = torch.load(os.path.join(embedding_dir, 'bank_embeddings.pt'), map_location=device)
self.index = faiss.IndexFlatL2(bank_embeddings.shape[-1])
if device == torch.device('cuda'):
    provider = faiss.StandardGpuResources()  # use a single GPU
    self.index = faiss.index_cpu_to_gpu(provider, 0, self.index) # make it a flat GPU index
self.index.add(bank_embeddings)

Next, I do my search

D, I = self.index.search(n, 3)

If I use CPU my AUROC performance is 0.9754 If I use the GPU, my AUROC score is 0.4675

There is something wrong? The code is always the same :S

Mar 21 '22 08:03 mjack3

No this is not normal. The only possible discrepancy is when there are duplicate vectors in the dataset and their ordering in result lists is random.

Mar 21 '22 10:03 mdouze

I have tested this:

import faiss
from faiss.contrib import torch_utils

device = torch.device('cuda') # GPU device
# device = torch.device('cpu') # CPU device

bank_embeddings: torch.Tensor = torch.load(os.path.join(embedding_dir, 'bank_embeddings.pt'), map_location=device)
 if device == torch.device('cuda'): # FIXME: the performance in GPU is decreased
       cfg = faiss.GpuIndexFlatConfig()
       resources = faiss.StandardGpuResources()
       self.index = faiss.GpuIndexFlatL2(resources, bank_embeddings.shape[-1], cfg)
else:
       self.index = faiss.IndexFlatL2(bank_embeddings.shape[-1])

self.index.add(bank_embeddings)

Next the same search

if not n.is_contiguous():
       n = n.contiguous() # n shape is: (784, 1536)
D, I = self.index.search(n, 3) # D shape is (784, 3)

D matrix if GPU is used

tensor([[15.5180, 18.6601, 19.2905],
        [18.4031, 21.5452, 22.1756],
        [20.4813, 23.6233, 24.2538],
        ...,
        [23.9322, 27.0743, 27.7047],
        [21.9629, 25.1050, 25.7354],
        [17.7017, 20.8438, 21.4742]], device='cuda:0')

D matrix if CPU is used:

tensor([[4.4300, 6.7436, 6.9746],
        [4.1022, 4.9629, 6.6677],
        [4.2402, 6.0263, 7.2301],
        ...,
        [4.8605, 5.8493, 6.3380],
        [4.7552, 5.8216, 5.8592],
        [4.7130, 6.0770, 6.9383]])

Values are quite differents.

Do you think this could be solve manually?

Mar 21 '22 11:03 mjack3

Did you any progress?

Mar 22 '22 20:03 mjack3

Could you repro with self-contained code?

Mar 31 '22 10:03 mdouze

Could you repro with self-contained code?

Sure. Give me until tomorrow :).

Thanks!

Mar 31 '22 11:03 mjack3

Hello @mdouze

I updated a self-contained code. There you just need to execute the main.py for running the tests. Let me know if you can access properly.

https://github.com/mjack3/FAISS_GPU_TEST

Note: I am using the torch+CUDA==11.3 although requeriments.txt has the torch+CUDA==10.2. I think that won't make difference.

Thank you so much

Apr 01 '22 08:04 mjack3

Running your test on Faiss 1.7.2 and torch 1.10.1 yields https://gist.github.com/mdouze/df42bd8ad8c74c42fedd3fd1043e83b7

NB that the Cuda version does matter and that pip is not a supported way of installing Faiss (use conda).

Apr 04 '22 07:04 mdouze

pip is not a supported way of installing sure there is the problem. I have my project environment configured with pip. Do you plan to support pip ?

Apr 04 '22 08:04 mjack3

For my cace, I was experiencing the cpu-gpu difference issue with faiss-gpu==1.7.2 that I installed with pip. After uninstalling it, and reintalling faiss-gpu by conda using conda install -c pytorch faiss-gpu==1.7.3 the issue has been resolved.

Dec 26 '23 19:12 roomo7time

faiss
faiss copied to clipboard

Faiss GPU return different result than CPU

Summary

Platform

Reproduction instructions

faiss faiss copied to clipboard

Faiss GPU return different result than CPU

Summary

Platform

Reproduction instructions

faiss
faiss copied to clipboard