yolor
yolor copied to clipboard
CUDA out of memory after some epochs
I'm training on an EC2 instance with T4 GPU and 16GB of memory.
I'm using a batch size of 2 and image size of 960, however after 3 epochs the script is killed because GPU is out of memory. How can I overcome this without reducing my batch size to 1?
Thanks for the help.
Hi, you can change line:
https://github.com/WongKinYiu/yolor/blob/be7da6eba2f612a15bf462951d3cdde66755a180/train.py#L219
and line:
https://github.com/WongKinYiu/yolor/blob/be7da6eba2f612a15bf462951d3cdde66755a180/train.py#L361
not sure why the batch size is doubled during validation, but that solved the issue for me.