GPU OOMs on large directory scans
I'm crawling a large directory structure that contains 10s of thousands of high resolution images.
Using the CNN() method, it OOMs before it finishes the scan.
Traceback (most recent call last): File "/home/philglau/dedup_py/main.py", line 85, in <module> search('PyCharm') File "/home/philglau/dedup_py/main.py", line 50, in search encodings = cnn.encode_images(image_dir=image_dir,recursive=True,num_enc_workers=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/philglau/miniconda3/envs/id/lib/python3.11/site-packages/imagededup/methods/cnn.py", line 251, in encode_images return self._get_cnn_features_batch(image_dir=image_dir, recursive=recursive, num_workers=num_enc_workers) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/philglau/miniconda3/envs/id/lib/python3.11/site-packages/imagededup/methods/cnn.py", line 146, in _get_cnn_features_batch arr = self.model(ims.to(self.device)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/philglau/miniconda3/envs/id/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
I think the problem is that during the scan, it is reading all the images and, most importantly running apply_mobilenet_preprocess() which is performing a transform on the image on the GPU. However, those transforms are not being consumed ?? In other words it seems like it's trying to scan the entire structure before proceeding with encoding the results??
Or at least that's what it seems like to me. (or perhaps it's doing the encoding, but some parts of the GPU memory are not being released after being consumed.)
- CNN.encode_images() calls _get_cnn_features_batch
- _get_cnn_features_batch calls img_dataloader(image_dir='mypath)
- img_dataloader calls ImgDataset with a basenet_preprocess set
- ImgDataset.get_item then applies the self.basenet_preprocess
- self.basenet_preprocess then hits apply_mobilenet_preprocess()
- which calls self.transform() which moves the data to the GPU
I believe it's all the apply_mobilenet_preprocess() that are filling the GPU before they have a chance to be consumed be the encoder ??? (or at least that's my guess)
Here's a screen shot from nvtop while CNN.encode_images() is still scanning the directories:
Shortly there after the encode_images() process crashes once the GPU (RTX 3090 with 24GB VRAM) goes OOM. I've tried adjusting the batch size using cnn.batch_size = 16 and other lower numbers but that doesn't make a difference. Still always OOMs.
As shown on the image, memory usage keeps increasing, but there is little or no compute occurring on the GPU during the time it is filling up.
I removed the GPU and it OOMs (main memory) when using Numpy as well. Set num of worker between 0 and 6. The only difference is the speed at which it OOMs main memory.
The directory has 1188 files and is 26GB in size (average image size is 1.8 MB)
# both GPU and CPU version overfill their respective memories.
image_dir = '/path/to/my_files'
cnn = CNN()
# changing the num of workers between 0 and 6 only changes the amount of time it takes to OOM
# 0 = slow time to OOM, 6 = faster time to OOM
# recursive doesn't affect it either
encodings = cnn.encode_images(image_dir=image_dir,recursive=False,num_enc_workers=6)
Your batch_size tuning makes a lot of sense and was indeed my first suspicion, but it's even weirder that crashes without GPU too. Could you test it on a really small set of images? Maybe like 10? Also, which version of the package are you using? (There have been new releases last days, would be great if you could test on the latest one)