faiss
faiss copied to clipboard
Faiss GPU return different result than CPU
Summary
I have a code that return higher performance running FAISS on CPU than GPU. The difference is huge.
Platform
Ubuntu 20.04 lTS
Faiss version: 1.7.2
Installed from: pip install faiss-gpu https://pypi.org/project/faiss-gpu/
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.39.01 Driver Version: 510.39.01 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:0A:00.0 On | N/A |
| 36% 29C P8 39W / 370W | 1095MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1325 G /usr/lib/xorg/Xorg 102MiB |
| 0 N/A N/A 1906 G /usr/lib/xorg/Xorg 523MiB |
| 0 N/A N/A 2043 G /usr/bin/gnome-shell 89MiB |
| 0 N/A N/A 2436 G ...697963766717379249,131072 361MiB |
+-----------------------------------------------------------------------------+
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Fri_Dec_17_18:16:03_PST_2021
Cuda compilation tools, release 11.6, V11.6.55
Build cuda_11.6.r11.6/compiler.30794723_0
#define CUDNN_MAJOR 8
#define CUDNN_MINOR 3
#define CUDNN_PATCHLEVEL 3
Running on:
- [X] CPU
- [X] GPU
Interface:
- [ ] C++
- [X] Python
Reproduction instructions
I am using AUROC to meassure the performance of certain code. The "faiss part" of the code looks like this:
import faiss
from faiss.contrib import torch_utils
device = torch.device('cuda'):
bank_embeddings: torch.Tensor = torch.load(os.path.join(embedding_dir, 'bank_embeddings.pt'), map_location=device)
self.index = faiss.IndexFlatL2(bank_embeddings.shape[-1])
if device == torch.device('cuda'):
provider = faiss.StandardGpuResources() # use a single GPU
self.index = faiss.index_cpu_to_gpu(provider, 0, self.index) # make it a flat GPU index
self.index.add(bank_embeddings)
Next, I do my search
D, I = self.index.search(n, 3)
If I use CPU my AUROC performance is 0.9754 If I use the GPU, my AUROC score is 0.4675
There is something wrong? The code is always the same :S
No this is not normal. The only possible discrepancy is when there are duplicate vectors in the dataset and their ordering in result lists is random.
I have tested this:
import faiss
from faiss.contrib import torch_utils
device = torch.device('cuda') # GPU device
# device = torch.device('cpu') # CPU device
bank_embeddings: torch.Tensor = torch.load(os.path.join(embedding_dir, 'bank_embeddings.pt'), map_location=device)
if device == torch.device('cuda'): # FIXME: the performance in GPU is decreased
cfg = faiss.GpuIndexFlatConfig()
resources = faiss.StandardGpuResources()
self.index = faiss.GpuIndexFlatL2(resources, bank_embeddings.shape[-1], cfg)
else:
self.index = faiss.IndexFlatL2(bank_embeddings.shape[-1])
self.index.add(bank_embeddings)
Next the same search
if not n.is_contiguous():
n = n.contiguous() # n shape is: (784, 1536)
D, I = self.index.search(n, 3) # D shape is (784, 3)
D matrix if GPU is used
tensor([[15.5180, 18.6601, 19.2905],
[18.4031, 21.5452, 22.1756],
[20.4813, 23.6233, 24.2538],
...,
[23.9322, 27.0743, 27.7047],
[21.9629, 25.1050, 25.7354],
[17.7017, 20.8438, 21.4742]], device='cuda:0')
D matrix if CPU is used:
tensor([[4.4300, 6.7436, 6.9746],
[4.1022, 4.9629, 6.6677],
[4.2402, 6.0263, 7.2301],
...,
[4.8605, 5.8493, 6.3380],
[4.7552, 5.8216, 5.8592],
[4.7130, 6.0770, 6.9383]])
Values are quite differents.
Do you think this could be solve manually?
Did you any progress?
Could you repro with self-contained code?
Could you repro with self-contained code?
Sure. Give me until tomorrow :).
Thanks!
Hello @mdouze
I updated a self-contained code. There you just need to execute the main.py for running the tests. Let me know if you can access properly.
https://github.com/mjack3/FAISS_GPU_TEST
Note: I am using the torch+CUDA==11.3 although requeriments.txt has the torch+CUDA==10.2. I think that won't make difference.
Thank you so much
Running your test on Faiss 1.7.2 and torch 1.10.1 yields https://gist.github.com/mdouze/df42bd8ad8c74c42fedd3fd1043e83b7
NB that the Cuda version does matter and that pip is not a supported way of installing Faiss (use conda).
pip is not a supported way of installing
sure there is the problem. I have my project environment configured with pip. Do you plan to support pip ?
For my cace, I was experiencing the cpu-gpu difference issue with faiss-gpu==1.7.2 that I installed with pip. After uninstalling it, and reintalling faiss-gpu by conda using conda install -c pytorch faiss-gpu==1.7.3
the issue has been resolved.