CUDA error when training on rectangular inputs at full resolution
Search before asking
- [X] I have searched the YOLOv8 issues and discussions and found no similar questions.
Question
The Component
Training
The Issue
I keep running into a CUDA error when trying to train on rectangular images. My images are 1920x1080, and I was able to train just fine on the default image size of 640. I then tried to specify imgsz as [1920,1080] and ran into the same as #785 , so in response to that thread, I changed the imgsz to just 1280 and rect=True. Still wouldn't be full resolution but better than 640.
I tried a few resolutions but the best I was able to successfully start training on was imgsz=1056, which the largest multiple of 32 less than the short side of my images (1080). Training on that now so I'll see how that goes. But I would like to be able to use as full resolution as possible because some of the objects I am trying to detect are quite small.
The Call
results = model.train(data=f"{path2}", epochs=100, device=0, imgsz=1280, rect=True, cache=False)
Backtrace
Traceback (most recent call last):
File "train.py", line 32, in
Additional
No response
I see the same error today. I add some new images with new class on roboflow and generate the new dataset from roboflow, then the yolo8 throw this error. When I try to use the old dataset, it works well. So I am not sure it's the problem of yolov8 or roboflow dataset? I search this error online, it shows the problem is the number of classes.
👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.
For additional resources and information, please see the links below:
- Docs: https://docs.ultralytics.com
- HUB: https://hub.ultralytics.com
- Community: https://community.ultralytics.com
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
Thank you for your contributions to YOLO 🚀 and Vision AI ⭐
@darouwan it's unfortunate that you're encountering this CUDA error while training YOLOv8 on rectangular images. The issue might not be directly related to the number of classes, as the error message suggests a CUDA-related problem rather than a specific class issue. However, it's beneficial to check if any changes in the class distribution or dataset properties between the old and new datasets might be triggering this error.
In general, optimizing the choice of input resolution via imgsz and the rect flag helps balance speed and accuracy during training. It's commendable that you're striving for the highest feasible resolution to detect small objects. Keep in mind that excessively large images may cause GPU memory issues, so finding the right trade-off between resolution and memory usage is vital.
To address the CUDA error, try updating your GPU drivers, using the latest PyTorch version, and ensuring that your CUDA toolkit is compatible with PyTorch. Additionally, setting CUDA_LAUNCH_BLOCKING=1 can help diagnose asynchronous CUDA kernel errors.
Lastly, if the issue persists, considering raising an issue on the YOLOv8 repo with additional details about your setup, including GPU type, PyTorch version, and any relevant environment configurations, could facilitate a more targeted resolution.
I hope these suggestions lead to a successful resolution!