yolor icon indicating copy to clipboard operation
yolor copied to clipboard

CUDA out of memory after some epochs

Open alejoGT1202 opened this issue 2 years ago • 1 comments

I'm training on an EC2 instance with T4 GPU and 16GB of memory.

I'm using a batch size of 2 and image size of 960, however after 3 epochs the script is killed because GPU is out of memory. How can I overcome this without reducing my batch size to 1?

Thanks for the help.

alejoGT1202 avatar Mar 16 '22 09:03 alejoGT1202

Hi, you can change line:

    https://github.com/WongKinYiu/yolor/blob/be7da6eba2f612a15bf462951d3cdde66755a180/train.py#L219

and line:

    https://github.com/WongKinYiu/yolor/blob/be7da6eba2f612a15bf462951d3cdde66755a180/train.py#L361

not sure why the batch size is doubled during validation, but that solved the issue for me.

mburges-cvl avatar Mar 29 '22 11:03 mburges-cvl