yolov5
yolov5 copied to clipboard
YOLOv5 for segmentation crashes in the first epoch
Search before asking
- [X] I have searched the YOLOv5 issues and found no similar bug report.
YOLOv5 Component
No response
Bug
Using on a machine with 1x GPU with
python segment/train.py --data data/sagemaker.yaml --weights 'yolov5s-seg.pt' --cfg yolov5s-seg.yaml --hyp hyp.scratch-high.yaml --img 640 --batch 90 --workers 8 --project runs/SageMaker/train --name 02-S-640-hyphigh
constantly crashes in the first epoch with:
14/299 13.5G 0.04768 0.05671 0.06603 0.0182 965 608: 35%|███▍ | 1121/3205 [11:18<23:05, 1.50it/s]libpng warning: iCCP: known incorrect sRGB profile
14/299 13.5G 0.04767 0.0567 0.06599 0.0182 1061 608: 36%|███▌ | 1145/3205 [11:33<20:23, 1.68it/s]libpng warning: iCCP: known incorrect sRGB profile
14/299 13.5G 0.04765 0.05671 0.06595 0.0182 1098 608: 41%|████ | 1312/3205 [13:12<17:14, 1.83it/s]libpng warning: iCCP: known incorrect sRGB profile
14/299 13.5G 0.04762 0.05672 0.06589 0.0182 1158 608: 45%|████▍ | 1442/3205 [14:30<17:43, 1.66it/s]
Traceback (most recent call last):
File "segment/train.py", line 658, in <module>
main(opt)
File "segment/train.py", line 554, in main
train(opt.hyp, opt, device, callbacks)
File "segment/train.py", line 283, in train
for i, (imgs, targets, paths, _, masks) in pbar: # batch ------------------------------------------------------
File "/opt/conda/envs/oneformer/lib/python3.8/site-packages/tqdm/std.py", line 1195, in __iter__
for obj in iterable:
File "/home/jupyter/yolov5/utils/dataloaders.py", line 172, in __iter__
yield next(self.iterator)
File "/opt/conda/envs/oneformer/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
data = self._next_data()
File "/opt/conda/envs/oneformer/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
return self._process_data(data)
File "/opt/conda/envs/oneformer/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
data.reraise()
File "/opt/conda/envs/oneformer/lib/python3.8/site-packages/torch/_utils.py", line 434, in reraise
raise exception
ValueError: Caught ValueError in DataLoader worker process 2.
Original Traceback (most recent call last):
File "/opt/conda/envs/oneformer/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/opt/conda/envs/oneformer/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/opt/conda/envs/oneformer/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/jupyter/yolov5/utils/segment/dataloaders.py", line 120, in __getitem__
img, labels, segments = mixup(img, labels, segments, *self.load_mosaic(random.randint(0, self.n - 1)))
File "/home/jupyter/yolov5/utils/segment/dataloaders.py", line 218, in load_mosaic
img, _, (h, w) = self.load_image(index)
File "/home/jupyter/yolov5/utils/dataloaders.py", line 732, in load_image
im = np.load(fn)
File "/opt/conda/envs/oneformer/lib/python3.8/site-packages/numpy/lib/npyio.py", line 432, in load
return format.read_array(fid, allow_pickle=allow_pickle,
File "/opt/conda/envs/oneformer/lib/python3.8/site-packages/numpy/lib/format.py", line 820, in read_array
array.shape = shape
ValueError: cannot reshape array of size 0 into shape (480,640,3)
Environment
YOLOv5 🚀 v7.0-59-gfdc35b1 Python-3.8.15 torch-1.10.1 CUDA:0
Minimal Reproducible Example
python segment/train.py --data data/sagemaker.yaml --weights 'yolov5s-seg.pt' --cfg yolov5s-seg.yaml --hyp hyp.scratch-high.yaml --img 640 --batch 90 --workers 8 --project runs/SageMaker/train
Additional
No response
Are you willing to submit a PR?
- [ ] Yes I'd like to help by submitting a PR!
@Robotatron I had the same issue. It got resolved after setting mixup to 0 in the hyp file.
Just trained a model without any augmentations at all (with --hyp hyp.no-augmentation.yaml) and it still crashes. What's weird is that everything worked fine 2 days ago and did not made any changes to the images or annotations:
python segment/train.py --data data/sagemaker.yaml --weights 'yolov5n-seg.pt' --cfg yolov5n-seg.yaml --hyp hyp.no-augmentation.yaml --img 320 --batch -1 --project runs/SageMaker/train --name no_aug
Plotting labels to runs/SageMaker/train/no_aug2/labels.jpg...
libpng warning: iCCP: known incorrect sRGB profile
Image sizes 320 train, 320 val
Using 8 dataloader workers
Logging results to runs/SageMaker/train/no_aug2
Starting training for 300 epochs...
Epoch GPU_mem box_loss seg_loss obj_loss cls_loss Instances Size
0/299 26.6G 0.1125 0.1356 0.02828 0.05641 4705 320: 21%|██ | 54/255 [04:00<14:23, 4.30s/it]libpng warning: iCCP: known incorrect sRGB profile
0/299 26.6G 0.09383 0.1079 0.03174 0.05172 4861 320: 45%|████▌ | 115/255 [08:13<09:36, 4.12s/it]libpng warning: iCCP: known incorrect sRGB profile
0/299 26.6G 0.08895 0.1016 0.03201 0.05017 4884 320: 58%|█████▊ | 147/255 [10:26<07:39, 4.25s/it]libpng warning: iCCP: known incorrect sRGB profile
0/299 26.6G 0.08428 0.09567 0.03208 0.04843 4936 320: 76%|███████▌ | 193/255 [13:36<04:22, 4.23s/it]
Traceback (most recent call last):
File "segment/train.py", line 658, in <module>
main(opt)
File "segment/train.py", line 554, in main
train(opt.hyp, opt, device, callbacks)
File "segment/train.py", line 283, in train
for i, (imgs, targets, paths, _, masks) in pbar: # batch ------------------------------------------------------
File "/opt/conda/envs/oneformer/lib/python3.8/site-packages/tqdm/std.py", line 1195, in __iter__
for obj in iterable:
File "/home/jupyter/yolov5/utils/dataloaders.py", line 172, in __iter__
yield next(self.iterator)
File "/opt/conda/envs/oneformer/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
data = self._next_data()
File "/opt/conda/envs/oneformer/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
return self._process_data(data)
File "/opt/conda/envs/oneformer/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
data.reraise()
File "/opt/conda/envs/oneformer/lib/python3.8/site-packages/torch/_utils.py", line 434, in reraise
raise exception
ValueError: Caught ValueError in DataLoader worker process 1.
Original Traceback (most recent call last):
File "/opt/conda/envs/oneformer/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/opt/conda/envs/oneformer/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/opt/conda/envs/oneformer/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/jupyter/yolov5/utils/segment/dataloaders.py", line 124, in __getitem__
img, (h0, w0), (h, w) = self.load_image(index)
File "/home/jupyter/yolov5/utils/dataloaders.py", line 732, in load_image
im = np.load(fn)
File "/opt/conda/envs/oneformer/lib/python3.8/site-packages/numpy/lib/npyio.py", line 432, in load
return format.read_array(fid, allow_pickle=allow_pickle,
File "/opt/conda/envs/oneformer/lib/python3.8/site-packages/numpy/lib/format.py", line 820, in read_array
array.shape = shape
ValueError: cannot reshape array of size 0 into shape (480,640,3)
- Everything worked fine 2-3 days ago and I did not made any changes to the images or annotations
- I went thought all images with a Python script to make sure PIL can open the image and that the len(shape) of the image is 3 (w,h and channels)
Any idea what is going on and what else can I try to debug this? @glenn-jocher
Hi, try to not pass a --hyp at all, let it default to whatever it wants. This might help you as I've seen all sorts of issues with it on regular O.D.
Hope this helps!
Hi, try to not pass a --hyp at all, let it default to whatever it wants. This might help you as I've seen all sorts of issues with it on regular O.D.
Hope this helps!
Still got the same error, since if you omit the --hyp then YOLO uses hyp.scratch-low.yaml by default.
It seems to be a problem not with image augmentation, but with the way YOLO loads the images in general, since it crashes even with --hyp hyp.no-augmentation.yaml
I see, I'm no use in this case then, sorry... I hope you end up solving your problem sooner than later though, good luck! :rocket:
your image is error,please check it ,suck as use opencv read every image
👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.
Access additional YOLOv5 🚀 resources:
- Wiki – https://github.com/ultralytics/yolov5/wiki
- Tutorials – https://docs.ultralytics.com/yolov5
- Docs – https://docs.ultralytics.com
Access additional Ultralytics ⚡ resources:
- Ultralytics HUB – https://ultralytics.com/hub
- Vision API – https://ultralytics.com/yolov5
- About Us – https://ultralytics.com/about
- Join Our Team – https://ultralytics.com/work
- Contact Us – https://ultralytics.com/contact
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!
@erquren hi, thanks for your suggestion! For the reported issue, it appears not to be caused by image-specific errors, as the images have been validated and the error persists even without hyp augmentations.
If you have any other ideas for debugging or resolving this issue, please feel free to share them. Thank you!