yolov5 YOLOv5 for segmentation crashes in the first epoch

trafficstars

Search before asking

[X] I have searched the YOLOv5 issues and found no similar bug report.

YOLOv5 Component

No response

Bug

Using on a machine with 1x GPU with

python segment/train.py --data data/sagemaker.yaml --weights 'yolov5s-seg.pt' --cfg yolov5s-seg.yaml --hyp hyp.scratch-high.yaml --img 640 --batch 90 --workers 8 --project runs/SageMaker/train --name 02-S-640-hyphigh

constantly crashes in the first epoch with:

     14/299      13.5G    0.04768    0.05671    0.06603     0.0182        965        608:  35%|███▍      | 1121/3205 [11:18<23:05,  1.50it/s]libpng warning: iCCP: known incorrect sRGB profile
     14/299      13.5G    0.04767     0.0567    0.06599     0.0182       1061        608:  36%|███▌      | 1145/3205 [11:33<20:23,  1.68it/s]libpng warning: iCCP: known incorrect sRGB profile
     14/299      13.5G    0.04765    0.05671    0.06595     0.0182       1098        608:  41%|████      | 1312/3205 [13:12<17:14,  1.83it/s]libpng warning: iCCP: known incorrect sRGB profile
     14/299      13.5G    0.04762    0.05672    0.06589     0.0182       1158        608:  45%|████▍     | 1442/3205 [14:30<17:43,  1.66it/s]
Traceback (most recent call last):
  File "segment/train.py", line 658, in <module>
    main(opt)
  File "segment/train.py", line 554, in main
    train(opt.hyp, opt, device, callbacks)
  File "segment/train.py", line 283, in train
    for i, (imgs, targets, paths, _, masks) in pbar:  # batch ------------------------------------------------------
  File "/opt/conda/envs/oneformer/lib/python3.8/site-packages/tqdm/std.py", line 1195, in __iter__
    for obj in iterable:
  File "/home/jupyter/yolov5/utils/dataloaders.py", line 172, in __iter__
    yield next(self.iterator)
  File "/opt/conda/envs/oneformer/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
    data = self._next_data()
  File "/opt/conda/envs/oneformer/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
    return self._process_data(data)
  File "/opt/conda/envs/oneformer/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
    data.reraise()
  File "/opt/conda/envs/oneformer/lib/python3.8/site-packages/torch/_utils.py", line 434, in reraise
    raise exception
ValueError: Caught ValueError in DataLoader worker process 2.
Original Traceback (most recent call last):
  File "/opt/conda/envs/oneformer/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/opt/conda/envs/oneformer/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/opt/conda/envs/oneformer/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/jupyter/yolov5/utils/segment/dataloaders.py", line 120, in __getitem__
    img, labels, segments = mixup(img, labels, segments, *self.load_mosaic(random.randint(0, self.n - 1)))
  File "/home/jupyter/yolov5/utils/segment/dataloaders.py", line 218, in load_mosaic
    img, _, (h, w) = self.load_image(index)
  File "/home/jupyter/yolov5/utils/dataloaders.py", line 732, in load_image
    im = np.load(fn)
  File "/opt/conda/envs/oneformer/lib/python3.8/site-packages/numpy/lib/npyio.py", line 432, in load
    return format.read_array(fid, allow_pickle=allow_pickle,
  File "/opt/conda/envs/oneformer/lib/python3.8/site-packages/numpy/lib/format.py", line 820, in read_array
    array.shape = shape
ValueError: cannot reshape array of size 0 into shape (480,640,3)

Environment

YOLOv5 🚀 v7.0-59-gfdc35b1 Python-3.8.15 torch-1.10.1 CUDA:0

Minimal Reproducible Example

python segment/train.py --data data/sagemaker.yaml --weights 'yolov5s-seg.pt' --cfg yolov5s-seg.yaml --hyp hyp.scratch-high.yaml --img 640 --batch 90 --workers 8 --project runs/SageMaker/train

Additional

No response

Are you willing to submit a PR?

[ ] Yes I'd like to help by submitting a PR!

Jan 08 '23 21:01 Robotatron

@Robotatron I had the same issue. It got resolved after setting mixup to 0 in the hyp file.

Jan 09 '23 03:01 mehran66

Just trained a model without any augmentations at all (with --hyp hyp.no-augmentation.yaml) and it still crashes. What's weird is that everything worked fine 2 days ago and did not made any changes to the images or annotations:

python segment/train.py --data data/sagemaker.yaml --weights 'yolov5n-seg.pt' --cfg yolov5n-seg.yaml --hyp hyp.no-augmentation.yaml --img 320 --batch -1 --project runs/SageMaker/train --name no_aug

Plotting labels to runs/SageMaker/train/no_aug2/labels.jpg... 
libpng warning: iCCP: known incorrect sRGB profile
Image sizes 320 train, 320 val
Using 8 dataloader workers
Logging results to runs/SageMaker/train/no_aug2
Starting training for 300 epochs...


      Epoch    GPU_mem   box_loss   seg_loss   obj_loss   cls_loss  Instances       Size
      0/299      26.6G     0.1125     0.1356    0.02828    0.05641       4705        320:  21%|██        | 54/255 [04:00<14:23,  4.30s/it]libpng warning: iCCP: known incorrect sRGB profile
      0/299      26.6G    0.09383     0.1079    0.03174    0.05172       4861        320:  45%|████▌     | 115/255 [08:13<09:36,  4.12s/it]libpng warning: iCCP: known incorrect sRGB profile
      0/299      26.6G    0.08895     0.1016    0.03201    0.05017       4884        320:  58%|█████▊    | 147/255 [10:26<07:39,  4.25s/it]libpng warning: iCCP: known incorrect sRGB profile
      0/299      26.6G    0.08428    0.09567    0.03208    0.04843       4936        320:  76%|███████▌  | 193/255 [13:36<04:22,  4.23s/it]
Traceback (most recent call last):
  File "segment/train.py", line 658, in <module>
    main(opt)
  File "segment/train.py", line 554, in main
    train(opt.hyp, opt, device, callbacks)
  File "segment/train.py", line 283, in train
    for i, (imgs, targets, paths, _, masks) in pbar:  # batch ------------------------------------------------------
  File "/opt/conda/envs/oneformer/lib/python3.8/site-packages/tqdm/std.py", line 1195, in __iter__
    for obj in iterable:
  File "/home/jupyter/yolov5/utils/dataloaders.py", line 172, in __iter__
    yield next(self.iterator)
  File "/opt/conda/envs/oneformer/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
    data = self._next_data()
  File "/opt/conda/envs/oneformer/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
    return self._process_data(data)
  File "/opt/conda/envs/oneformer/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
    data.reraise()
  File "/opt/conda/envs/oneformer/lib/python3.8/site-packages/torch/_utils.py", line 434, in reraise
    raise exception
ValueError: Caught ValueError in DataLoader worker process 1.
Original Traceback (most recent call last):
  File "/opt/conda/envs/oneformer/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/opt/conda/envs/oneformer/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/opt/conda/envs/oneformer/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/jupyter/yolov5/utils/segment/dataloaders.py", line 124, in __getitem__
    img, (h0, w0), (h, w) = self.load_image(index)
  File "/home/jupyter/yolov5/utils/dataloaders.py", line 732, in load_image
    im = np.load(fn)
  File "/opt/conda/envs/oneformer/lib/python3.8/site-packages/numpy/lib/npyio.py", line 432, in load
    return format.read_array(fid, allow_pickle=allow_pickle,
  File "/opt/conda/envs/oneformer/lib/python3.8/site-packages/numpy/lib/format.py", line 820, in read_array
    array.shape = shape
ValueError: cannot reshape array of size 0 into shape (480,640,3)

Jan 09 '23 17:01 Robotatron

Everything worked fine 2-3 days ago and I did not made any changes to the images or annotations
I went thought all images with a Python script to make sure PIL can open the image and that the len(shape) of the image is 3 (w,h and channels)

Any idea what is going on and what else can I try to debug this? @glenn-jocher

Jan 09 '23 17:01 Robotatron

Hi, try to not pass a --hyp at all, let it default to whatever it wants. This might help you as I've seen all sorts of issues with it on regular O.D.

Hope this helps!

Jan 10 '23 08:01 JustasBart

Hi, try to not pass a --hyp at all, let it default to whatever it wants. This might help you as I've seen all sorts of issues with it on regular O.D.

Hope this helps!

Still got the same error, since if you omit the --hyp then YOLO uses hyp.scratch-low.yaml by default. It seems to be a problem not with image augmentation, but with the way YOLO loads the images in general, since it crashes even with --hyp hyp.no-augmentation.yaml

Jan 11 '23 16:01 Robotatron

I see, I'm no use in this case then, sorry... I hope you end up solving your problem sooner than later though, good luck! :rocket:

Jan 11 '23 17:01 JustasBart

your image is error,please check it ,suck as use opencv read every image

Jan 30 '23 01:01 erquren

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Wiki – https://github.com/ultralytics/yolov5/wiki
Tutorials – https://docs.ultralytics.com/yolov5
Docs – https://docs.ultralytics.com

Access additional Ultralytics ⚡ resources:

Ultralytics HUB – https://ultralytics.com/hub
Vision API – https://ultralytics.com/yolov5
About Us – https://ultralytics.com/about
Join Our Team – https://ultralytics.com/work
Contact Us – https://ultralytics.com/contact

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

Mar 02 '23 00:03 github-actions[bot]

@erquren hi, thanks for your suggestion! For the reported issue, it appears not to be caused by image-specific errors, as the images have been validated and the error persists even without hyp augmentations.

If you have any other ideas for debugging or resolving this issue, please feel free to share them. Thank you!

Nov 15 '23 09:11 glenn-jocher

yolov5 yolov5 copied to clipboard

YOLOv5 for segmentation crashes in the first epoch

Search before asking

YOLOv5 Component

Bug

Environment

Minimal Reproducible Example

Additional

Are you willing to submit a PR?

yolov5
yolov5 copied to clipboard