yolov7
yolov7 copied to clipboard
RuntimeError: CUDA out of memory
iam trying to train a yolov7-tiny module on a custom dataset, iam training on kaggle which offers a free gpu, pytorch allocated more than 90% of the available memory which results in failure of training, i tried to train on my local machine and i had the same error, tried reducing image size/workers/batch-size and still the same result, and i have no problems training with yolov5 using the same exact setup.
my training command: !python train.py --workers 4 --device 0 --batch-size 16 --data /kaggle/working/dataset/config/custom.yaml --img 640--cfg /kaggle/working/dataset/config/yolov7-custom.yaml --weights 'yolov7-tiny.pt' --name yolov7
Why does pytroch allocate most of the GPU memory?
Error logs:
Traceback (most recent call last):
File "train.py", line 610, in
decrease batch size
i also met the same problem I get the same situation after some epochs. Not a solution, but the first time --save_period 10 and after an error occurs, --resume --save_period 10 Then it is possible to continue.
Same problem here! I changed the batch size to 1, reduced image dim and number of workers,.... still the issue is there. The GPU memory usage changes from iteration to iteration! I played with PYTORCH_CUDA_ALLOC_CONF variable too, but the issue did not go away! I also realized that this happens when number of the classes is high (for example over 20 classes). I tested it with calssNum=3 and it worked like a charm.