yolov7 icon indicating copy to clipboard operation
yolov7 copied to clipboard

CUDA out of memory for larger dataset.

Open mazatov opened this issue 1 year ago • 2 comments

I am able to train yolov7 on a small dataset but when I increase the dataset size from ~300 images to about 2000 it breaks with this error. I use the exact same parameters for training. Any idea why would the size of the dataset matter? I tried batch size 1 with the same result. I have 2080 TI so shouldn't be getting memory issue here.

python train.py --workers 1 --batch-size 1 --data academy2/cfg.yaml --img 640 640 --cfg academy2/yolov7.yaml --weights yolov7.pt --name yolov7-academy2 --hyp academy2/hyp.scratch.p5.yaml

     Epoch   gpu_mem       box       obj       cls     total    labels  img_size
     0/299     10.7G   0.08955   0.09489   0.07693    0.2614       172       640:   0%|▎                                                                                                                 | 5/1887 [00:17<1:52:11,  3.58s/it]
Traceback (most recent call last):
  File "train.py", line 616, in <module>
    train(hyp, opt, device, tb_writer)
  File "train.py", line 363, in train
    loss, loss_items = compute_loss_ota(pred, targets.to(device), imgs)  # loss scaled by batch_size
  File "C:\Users\aaa\Documents\Python\yolov7\utils\loss.py", line 585, in __call__
    bs, as_, gjs, gis, targets, anchors = self.build_targets(p, targets, imgs)
  File "C:\Users\aaa\Documents\Python\yolov7\utils\loss.py", line 733, in build_targets
    torch.log(y/(1-y)) , gt_cls_per_image, reduction="none"
  File "C:\Users\aaa\anaconda3\envs\yolov7_04072023\lib\site-packages\torch\nn\functional.py", line 3132, in binary_cross_entropy_with_logits
    return torch.binary_cross_entropy_with_logits(input, target, weight, pos_weight, reduction_enum)
RuntimeError: CUDA out of memory. Tried to allocate 6.07 GiB (GPU 0; 11.00 GiB total capacity; 37.90 GiB already allocated; 0 bytes free; 39.60 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

mazatov avatar Nov 10 '23 22:11 mazatov

I am facing similar problems with a large dataset. (120K images 450K labels). My "guesstimate" at this point is that there is not enough system ram for cuda to allocate a buffer that is used to transfer data to the GPU

skoroneos avatar Nov 17 '23 06:11 skoroneos

For me it seems to have to do with loss_ota. When I set it to 0, it runs without problem on batch_size =2 and image size 1280. And when it's not 0, I can't even run it on batch_size = 1, img_size = 400. Seems like there is some RAM inefficiency with that loss. I don't know if it's bug or the loss is supposed to eat that much RAM

mazatov avatar Nov 17 '23 07:11 mazatov