yolov7
yolov7 copied to clipboard
CUDA out of memory for larger dataset.
I am able to train yolov7 on a small dataset but when I increase the dataset size from ~300 images to about 2000 it breaks with this error. I use the exact same parameters for training. Any idea why would the size of the dataset matter? I tried batch size 1 with the same result. I have 2080 TI so shouldn't be getting memory issue here.
python train.py --workers 1 --batch-size 1 --data academy2/cfg.yaml --img 640 640 --cfg academy2/yolov7.yaml --weights yolov7.pt --name yolov7-academy2 --hyp academy2/hyp.scratch.p5.yaml
Epoch gpu_mem box obj cls total labels img_size
0/299 10.7G 0.08955 0.09489 0.07693 0.2614 172 640: 0%|▎ | 5/1887 [00:17<1:52:11, 3.58s/it]
Traceback (most recent call last):
File "train.py", line 616, in <module>
train(hyp, opt, device, tb_writer)
File "train.py", line 363, in train
loss, loss_items = compute_loss_ota(pred, targets.to(device), imgs) # loss scaled by batch_size
File "C:\Users\aaa\Documents\Python\yolov7\utils\loss.py", line 585, in __call__
bs, as_, gjs, gis, targets, anchors = self.build_targets(p, targets, imgs)
File "C:\Users\aaa\Documents\Python\yolov7\utils\loss.py", line 733, in build_targets
torch.log(y/(1-y)) , gt_cls_per_image, reduction="none"
File "C:\Users\aaa\anaconda3\envs\yolov7_04072023\lib\site-packages\torch\nn\functional.py", line 3132, in binary_cross_entropy_with_logits
return torch.binary_cross_entropy_with_logits(input, target, weight, pos_weight, reduction_enum)
RuntimeError: CUDA out of memory. Tried to allocate 6.07 GiB (GPU 0; 11.00 GiB total capacity; 37.90 GiB already allocated; 0 bytes free; 39.60 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
I am facing similar problems with a large dataset. (120K images 450K labels). My "guesstimate" at this point is that there is not enough system ram for cuda to allocate a buffer that is used to transfer data to the GPU
For me it seems to have to do with loss_ota
. When I set it to 0, it runs without problem on batch_size =2
and image size 1280. And when it's not 0, I can't even run it on batch_size = 1
, img_size = 400
. Seems like there is some RAM inefficiency with that loss. I don't know if it's bug or the loss is supposed to eat that much RAM