Out Of Memory Error when training any model

Open CretuCalin opened this issue 4 years ago • 3 comments

I'm getting OOM error when training any model. This is probably not from the batch size, because it happens after a number of iterations.

Script for training:

cd src python train.py mot --exp_id crowdhuman_dla34 --gpus 0 --batch_size 4 --load_model '../models/ctdet_coco_dla_2x.pth' \ --num_epochs 60 --lr_step '50' --data_cfg '../src/lib/cfg/crowdhuman.json' \ --arch=hrnet_18 cd ..

Console output by training

System information: Ubuntu 16.04.6 LTS CUDA Version 10.2.89 Pytorch 1.7.1

Mar 15 '21 15:03 CretuCalin

I have same problem now. I added torch.cuda.empty_cache(), but results are the same.
does anyone solve this problem?

May 12 '21 03:05 yunseung-dable

I also have same problem, it happens after a number of epoch.

Jul 28 '21 05:07 zhouzw87

I meet the same problem after a number of epoch.Has anyone solved it?

Sep 29 '22 10:09 HeBotong