semantic-segmentation icon indicating copy to clipboard operation
semantic-segmentation copied to clipboard

training error on colab

Open hb0313 opened this issue 2 years ago • 3 comments

My all setup is successful on colab for training. However, when I run

!python tools/train.py --cfg configs/CONFIG_FILE.yaml

I get error:

Found 20210 training images. Found 2000 validation images. Epoch: [1/500] Iter: [0/2526] LR: 0.00100000 Loss: 0.00000000: 0% 0/2526 [00:00<?, ?it/s] Traceback (most recent call last): File "tools/train.py", line 128, in main(cfg, gpu, save_dir) File "tools/train.py", line 69, in main for iter, (img, lbl) in pbar: File "/usr/local/lib/python3.7/dist-packages/tqdm/std.py", line 1195, in iter for obj in iterable: File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 681, in next data = self._next_data() File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1376, in _next_data return self._process_data(data) File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1402, in _process_data data.reraise() File "/usr/local/lib/python3.7/dist-packages/torch/_utils.py", line 461, in reraise raise exception RuntimeError: Caught RuntimeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop data = fetcher.fetch(index) File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/content/semantic-segmentation/semseg/datasets/ade20k.py", line 73, in getitem image, label = self.transform(image, label) File "/content/semantic-segmentation/semseg/augmentations.py", line 20, in call img, mask = transform(img, mask) File "/content/semantic-segmentation/semseg/augmentations.py", line 329, in call mask = TF.pad(mask, padding, fill=self.seg_fill) File "/usr/local/lib/python3.7/dist-packages/torchvision/transforms/functional.py", line 481, in pad return F_t.pad(img, padding=padding, fill=fill, padding_mode=padding_mode) File "/usr/local/lib/python3.7/dist-packages/torchvision/transforms/functional_tensor.py", line 418, in pad img = torch_pad(img, p, mode=padding_mode, value=float(fill)) RuntimeError: value cannot be converted to type uint8_t without overflow

hb0313 avatar Sep 14 '22 04:09 hb0313

I think it is the pytorch version mismatch error. Please try different pytorch version.

sithu31296 avatar Sep 14 '22 05:09 sithu31296

I think it is the pytorch version mismatch error. Please try different pytorch version.

Hello, I use the camvid to train,get the error: min_value = pred[min(self.min_kept, pred.numel() - 1)] IndexError: index -1 is out of bounds for dimension 0 with size 0

scl666 avatar Oct 11 '22 07:10 scl666

Hello,

I encountered the same error, and updating torch and torchvision did not resolve it. The issue appears to arise when seg_fill receives a value of -1, as defined in the ade20k config file (IGNORE_LABEL: -1). Changing the value of IGNORE_LABEL resolved the problem. Could you please advise on the appropriate value that IGNORE_LABEL should be set to?

ilanaKarimov avatar Jun 16 '23 16:06 ilanaKarimov