faiss
faiss copied to clipboard
Out of Memory Error when running on GPU
Summary
Hi, I want to do exact NN search for 11M samples and 512 features. Hence I have a feature matrix of 11M x 512
Platform
Static hostname: u124281
Icon name: computer-server
Chassis: server
Machine ID: 1a347b1c907c42bb81d003b8876d5b8b
Boot ID: eb89be2b096e4d43a95e77b6b9bc735d
Operating System: Ubuntu 20.04.4 LTS
Kernel: Linux 5.4.0-117-generic
Architecture: x86-64
GPU specifications are the following, obtained via nvidia-smi
command -
Mon Sep 11 00:39:34 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.48.07 Driver Version: 515.48.07 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A100 80G... Off | 00000000:18:00.0 Off | 0 |
| N/A 27C P0 60W / 300W | 6381MiB / 81920MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA A100 80G... Off | 00000000:3B:00.0 Off | 0 |
| N/A 67C P0 295W / 300W | 30068MiB / 81920MiB | 100% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA A100 80G... Off | 00000000:86:00.0 Off | 0 |
| N/A 27C P0 59W / 300W | 7MiB / 81920MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA A100 80G... Off | 00000000:AF:00.0 Off | 0 |
| N/A 27C P0 44W / 300W | 7MiB / 81920MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1547 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 1410357 C python 411MiB |
| 0 N/A N/A 1865828 C ...a/envs/pytorch/bin/python 5963MiB |
| 1 N/A N/A 1547 G /usr/lib/xorg/Xorg 4MiB |
| 1 N/A N/A 1410357 C python 30061MiB |
| 2 N/A N/A 1547 G /usr/lib/xorg/Xorg 4MiB |
| 3 N/A N/A 1547 G /usr/lib/xorg/Xorg 4MiB |
+-----------------------------------------------------------------------------+
Faiss version:
faiss-gpu 1.7.3 py3.9_h28a55e0_0_cuda11.3 pytorch
libfaiss 1.7.3 hfc2d529_0_cuda11.3 pytorch
Installed from: conda 23.1.0
Faiss compilation options:
Running on: GPU Interface: Python
Reproduction instructions
Following is the minimum working example to reproduce the issue -
Code for minimum working example - this is just using a random matrix, but in the real codebase we of course use a matrix of features -
Summary
Hi, I want to do exact NN search for 11M samples and 512 features. Hence I have a feature matrix of 11M x 512
Platform
Static hostname: u124281
Icon name: computer-server
Chassis: server
Machine ID: 1a347b1c907c42bb81d003b8876d5b8b
Boot ID: eb89be2b096e4d43a95e77b6b9bc735d
Operating System: Ubuntu 20.04.4 LTS
Kernel: Linux 5.4.0-117-generic
Architecture: x86-64
GPU specifications are the following, obtained via nvidia-smi
command -
Mon Sep 11 00:39:34 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.48.07 Driver Version: 515.48.07 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A100 80G... Off | 00000000:18:00.0 Off | 0 |
| N/A 27C P0 60W / 300W | 6381MiB / 81920MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA A100 80G... Off | 00000000:3B:00.0 Off | 0 |
| N/A 67C P0 295W / 300W | 30068MiB / 81920MiB | 100% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA A100 80G... Off | 00000000:86:00.0 Off | 0 |
| N/A 27C P0 59W / 300W | 7MiB / 81920MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA A100 80G... Off | 00000000:AF:00.0 Off | 0 |
| N/A 27C P0 44W / 300W | 7MiB / 81920MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1547 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 1410357 C python 411MiB |
| 0 N/A N/A 1865828 C ...a/envs/pytorch/bin/python 5963MiB |
| 1 N/A N/A 1547 G /usr/lib/xorg/Xorg 4MiB |
| 1 N/A N/A 1410357 C python 30061MiB |
| 2 N/A N/A 1547 G /usr/lib/xorg/Xorg 4MiB |
| 3 N/A N/A 1547 G /usr/lib/xorg/Xorg 4MiB |
+-----------------------------------------------------------------------------+
Faiss version:
faiss-gpu 1.7.3 py3.9_h28a55e0_0_cuda11.3 pytorch
libfaiss 1.7.3 hfc2d529_0_cuda11.3 pytorch
Installed from: conda 23.1.0
Faiss compilation options:
Running on: GPU Interface: Python
Reproduction instructions
Following is the minimum working example to reproduce the issue -
Code for minimum working example - this is just using a random matrix, but in the real codebase we of course use a matrix of features -
import faiss
import numpy as np
import logging
logger = logging.getLogger(__name__)
logging.basicConfig(format="%(asctime)s %(levelname)-8s %(message)s", level=logging.INFO, datefmt="%Y-%m-%d %H:%M:%S")
# 11060223
data = np.random.rand(1000, 512).astype("float32")
logger.info(f"Data loaded. Shape = {data.shape}")
num_columns = data.shape[1]
faiss.omp_set_num_threads(faiss.omp_get_max_threads() - 1)
logger.info(f"Using {faiss.omp_get_max_threads() - 1} threads.")
cpu_index = faiss.IndexFlatL2(num_columns)
k_nearest_neighbors = 2047
number_of_gpus = faiss.get_num_gpus()
logger.info(f"Running on {number_of_gpus} GPUs.")
index = faiss.index_cpu_to_all_gpus(cpu_index)
index.add(data)
logger.info("Finding nearest neighbors.")
similarities, indices = index.search(data, k_nearest_neighbors)
Logs -
[2023-09-11 00:33:33,277][faiss.loader][INFO] - Loading faiss with AVX2 support.
[2023-09-11 00:33:33,296][faiss.loader][INFO] - Successfully loaded faiss with AVX2 support.
Faiss sparse similarity/distance matrix does not exist - hence computing it!
[2023-09-11 00:33:33,300][utils.faiss_mat][INFO] - Loading data.
[2023-09-11 00:34:06,923][utils.faiss_mat][INFO] - Data loaded. Shape = (11060223, 512)
[2023-09-11 00:34:06,924][utils.faiss_mat][INFO] - Loading landmarks.
[2023-09-11 00:34:49,933][utils.faiss_mat][INFO] - Landmarks loaded. Shape = (11060223, 512)
[2023-09-11 00:34:49,934][utils.faiss_mat][INFO] - Using 18 threads.
[2023-09-11 00:34:49,934][utils.faiss_mat][INFO] - Running on 4 GPUs.
[2023-09-11 00:35:50,087][utils.faiss_mat][INFO] - Not normalizing matrix as metric being used is simeuclid
[2023-09-11 00:35:59,110][utils.faiss_mat][INFO] - Finding nearest neighbors.
Error executing job with overrides: ['seed=5', 'use_ffcv=false', 'dataset=imagenet21k', 'batch_size=256', 'num_classes=10450', 'phase=calibration', 'summary_parameters.fraction=0.001', '+use_gpu_faiss=true', '+faiss_knn=2047', 'summary_parameters.feat_type_list=[clip_vit_b_32]', 'summary_parameters.feat_mode_list=[activation]', 'summary_parameters.sparse_type=zcopblock_precalc_sparse', 'recalculate_params.sparsification_clustering=true', 'summary_parameters.smraiz_constrains=[partition_matroid]', 'summary_parameters.fn_type=smraiz', 'summary_parameters.sim_type=simeuclid', 'summary_parameters.use_sparse_representation=true', 'summary_parameters.feat_responsibilities=[10]', 'submod_max_algo.type=stochastic_greedy', 'submod_max_algo.eps=1e-10', 'submod_max_algo.log_iter=1000', 'eval_mode=semantic_softmax', 'num_eval=1', 'use_saved_data=false', 'summarization_strategy=whole', 'summary_parameters.knn=1000', 'root_data_dir=/data/megh98/projects/datasets/imagenet21k/imagenet21k_resized/']
Traceback (most recent call last):
File "/data/megh98/projects/dev_folder/smrai-container-documentation/src/main.py", line 286, in main
images, subset_labels, summary_elements = summarize.get_summary()
File "/data/megh98/projects/dev_folder/smrai-container-documentation/src/summarizers/summary.py", line 95, in get_summary
summary_elements = smraiz_obj.get_smraiz_summary()
File "/data/megh98/projects/dev_folder/smrai-container-documentation/src/summarizers/smraiz_sum.py", line 324, in get_smraiz_summary
self.similarity_matrix, self.sim_filename = self.get_similarity_mat(feat_type, feat_mode, feat_basename)
File "/data/megh98/projects/dev_folder/smrai-container-documentation/src/summarizers/smraiz_sum.py", line 179, in get_similarity_mat
self.sim_filename = make_sim_or_dist_dist_file(filename=self.sim_filename, sim_type=self.sim_type, dname=self.dname, config_dict=self.config_dict, knn_k = self.knn_k, use_gpu_faiss=self.use_gpu_faiss, feat_filename=self.feat_filename)
File "/data/megh98/projects/dev_folder/smrai-container-documentation/src/summarizers/smraiz_utils.py", line 108, in make_sim_or_dist_dist_file
filename = construct_simmat_faiss(filename, faiss_knn, feat_filename, sim_type, use_gpu_faiss)
File "/data/megh98/projects/dev_folder/smrai-container-documentation/src/summarizers/smraiz_utils.py", line 134, in construct_simmat_faiss
sim_filename_faiss = construct_sparse_similarity_matrix(data_file=feat_filename, landmark_file=feat_filename, output_file=sim_filename_faiss, k_nearest_neighbors=faiss_knn, metric=sim_type, use_gpu=use_gpu_faiss)
File "/data/megh98/projects/dev_folder/smrai-container-documentation/src/utils/faiss_mat.py", line 166, in construct_sparse_similarity_matrix
similarities, indices = index.search(data, k_nearest_neighbors)
File "/home/megh98/anaconda3/envs/imagenet/lib/python3.9/site-packages/faiss/class_wrappers.py", line 343, in replacement_search
self.search_c(n, swig_ptr(x), k, swig_ptr(D), swig_ptr(I), params)
File "/home/megh98/anaconda3/envs/imagenet/lib/python3.9/site-packages/faiss/swigfaiss_avx2.py", line 9400, in search
return _swigfaiss_avx2.IndexReplicas_search(self, n, x, k, distances, labels, params)
RuntimeError: Exception thrown from index 0: Error in virtual void* faiss::gpu::StandardGpuResourcesImpl::allocMemory(const faiss::gpu::AllocRequest&) at /root/miniconda3/conda-bld/faiss-pkg_1669821803039/work/faiss/gpu/StandardGpuResources.cpp:452: Error: 'err == cudaSuccess' failed: StandardGpuResources: alloc fail type TemporaryMemoryOverflow dev 0 space Device stream 0xb782520 size 45280557056 bytes (cudaMalloc error out of memory [2])
Exception thrown from index 1: Error in virtual void* faiss::gpu::StandardGpuResourcesImpl::allocMemory(const faiss::gpu::AllocRequest&) at /root/miniconda3/conda-bld/faiss-pkg_1669821803039/work/faiss/gpu/StandardGpuResources.cpp:452: Error: 'err == cudaSuccess' failed: StandardGpuResources: alloc fail type TemporaryMemoryOverflow dev 1 space Device stream 0x83679830 size 45280557056 bytes (cudaMalloc error out of memory [2])
Exception thrown from index 2: Error in virtual void* faiss::gpu::StandardGpuResourcesImpl::allocMemory(const faiss::gpu::AllocRequest&) at /root/miniconda3/conda-bld/faiss-pkg_1669821803039/work/faiss/gpu/StandardGpuResources.cpp:452: Error: 'err == cudaSuccess' failed: StandardGpuResources: alloc fail type TemporaryMemoryOverflow dev 2 space Device stream 0xa01bef50 size 45280557056 bytes (cudaMalloc error out of memory [2])
Exception thrown from index 3: Error in virtual void* faiss::gpu::StandardGpuResourcesImpl::allocMemory(const faiss::gpu::AllocRequest&) at /root/miniconda3/conda-bld/faiss-pkg_1669821803039/work/faiss/gpu/StandardGpuResources.cpp:452: Error: 'err == cudaSuccess' failed: StandardGpuResources: alloc fail type TemporaryMemoryOverflow dev 3 space Device stream 0xbcb1b640 size 45280540928 bytes (cudaMalloc error out of memory [2])
Is there some way to do exact NN search, i.e. I still want to use the IndexFlatL2
, but say calculate the distances in batches so that it does not give an OOM error, i.e. if that is the cause of the error in the first place? I am a little unsure of the cause of the error.
The data is of size 22GB and the GPU memory is 80GB, so I think that should not be a problem, since it fits in the RAM, so I am not sure what the problem is?
Please do let me know if I am missing anything and thank you very much for your time!
Actually, I think I might have figured out how to do it, I can just chunk my queries into batches and then loop over those chunks and do index.search(chunk, kneighbors)
, which would go something like -
index.add(whole_data)
chunks = chunk_data(whole_data, num_chunks)
for chunk in chunks:
index.search(chunk, kneighbors)
yes please do. We recently introduced batching for large GPU queries but it is not available everywhere.
@meghbhalerao where did you find docs for chunking?
I don't think I used any docs. The code which I used above simply does the matrix matrix multiply in chunks rather than all at once.
I am using the chunk of code you had in your response and I still get the error from your original post
You might have to reduce the chunk size, maybe? So that it fits in the memory of your GPU? or it might be just that your index might be too large?
On Wed, Jan 10, 2024 at 1:49 PM gajghatenv @.***> wrote:
I am using the chunk of code you had in your response and I still get the error from your original post
— Reply to this email directly, view it on GitHub https://github.com/facebookresearch/faiss/issues/3049#issuecomment-1885793845, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH5KGEBCMPXIXCAQ4HMUXDDYN4EF7AVCNFSM6AAAAAA4SUJX5OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBVG44TGOBUGU . You are receiving this because you were mentioned.Message ID: @.***>
-- Thanks & Regards, Megh Bhalerao B.Tech in Electrical & Electronics Engineering Homepage: https://meghbhalerao.github.io https://meghbhalerao.github.io