detecto
detecto copied to clipboard
CUDA out of memory error during model.fit()
I'm trying basic example from https://towardsdatascience.com/build-a-custom-trained-object-detection-model-with-5-lines-of-code-713ba7f6c0fb
my video card is "NVIDIA GeForce MX150" (laptop) with 2 Gb video RAM. OS: ubuntu 20.04 + NVidia driver 470
I have 61 custom images with marked object on them
when I execute this simple code:
from detecto import core, utils, visualize
dataset = core.Dataset('images_to_learn/')
model = core.Model(['my_object'])
model.fit(dataset)
it fails of model.fit(dataset) with the error:
Epoch 1 of 10
Begin iterating over training dataset
0%| | 0/61 [00:00<?, ?it/s]/home/xwizard/.local/lib/python3.9/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2157.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
2%|▊ | 1/61 [00:02<02:07, 2.12s/it]
Traceback (most recent call last):
File "/home/xwizard/test/main.py", line 24, in <module>
model.fit(dataset)
File "/home/xwizard/.local/lib/python3.9/site-packages/detecto/core.py", line 505, in fit
loss_dict = self._model(images, targets)
File "/home/xwizard/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/xwizard/.local/lib/python3.9/site-packages/torchvision/models/detection/generalized_rcnn.py", line 96, in forward
proposals, proposal_losses = self.rpn(images, features, targets)
File "/home/xwizard/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/xwizard/.local/lib/python3.9/site-packages/torchvision/models/detection/rpn.py", line 354, in forward
proposals = self.box_coder.decode(pred_bbox_deltas.detach(), anchors)
File "/home/xwizard/.local/lib/python3.9/site-packages/torchvision/models/detection/_utils.py", line 180, in decode
pred_boxes = self.decode_single(
File "/home/xwizard/.local/lib/python3.9/site-packages/torchvision/models/detection/_utils.py", line 223, in decode_single
pred_boxes1 = pred_ctr_x - c_to_c_w
RuntimeError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 1.96 GiB total capacity; 1.12 GiB already allocated; 2.88 MiB free; 1.26 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Pytorch just takes all available memory and crashes.
Could you try some of the solutions listed in this post to see if any of those help?
By adding:
import gc del dataset gc.collect()
right before I created and ran my dataset, this fixed the issue. Hope this helps @TimurNurlygayanov