pytorch-semantic-segmentation Crashing intermittently when training psp

I have been getting this error consistently before it manages complete a single epoch:

[epoch 1], [iter 630 / 8498], [train main loss 1.32866], [train aux loss 1.31173]. [lr 0.0049055503] Traceback (most recent call last): File "train.py", line 252, in main() File "train.py", line 105, in main train(train_loader, net, criterion, optimizer, curr_epoch, args, val_loader, visualize) File "train.py", line 113, in train for i, data in enumerate(train_loader): File "/home/andreas/anaconda2/envs/env/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 201, in next return self._process_next_batch(batch) File "/home/andreas/anaconda2/envs/env/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 221, in _process_next_batch raise batch.exc_type(batch.exc_msg) ValueError: Traceback (most recent call last): File "/home/andreas/anaconda2/envs/env/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 40, in _worker_loop samples = collate_fn([dataset[i] for i in batch_indices]) File "/home/andreas/Dropbox/src/pytorch-semantic-segmentation/datasets/voc.py", line 99, in getite m img, mask = torch.stack(img_slices, 0), torch.stack(mask_slices, 0) File "/home/andreas/anaconda2/envs/env/lib/python2.7/site-packages/torch/functional.py", line 59, in stack raise ValueError("stack expects a non-empty sequence of tensors") ValueError: stack expects a non-empty sequence of tensors

Jan 08 '18 03:01 andreasrobinson

I met the same problem. It seems that the data augmentation codes don't work well. Can you check it? @ZijunDeng

Jan 09 '18 09:01 lzj322

@andreasrobinson @lzj322 Have you fix this problems? I also met the same problem

Jan 11 '18 03:01 zhijiew

I find it!

This error is caused by some training data, maybe there are some error when preprocess data, so I just delete them and my training code can run.

just delete these lines in train.txt: 724, 1237, 3572, 3920, 4688, 7031,

Jan 12 '18 07:01 zhijiew

@littlebelly , can you get a successful training result? Do you change any of author's codes?

Jan 15 '18 07:01 lzj322

I can train the network after delete training data I mentioned above, but there are still some errors in validate process, so I just give it up and use caffe code provided by offical author.

I feel that these errors caused by image slice operation which is needed in cityscape dataset because of the large image size, but unnecessary in voc2012 dataset.

I hope @ZijunDeng can help to solve these errors~

Jan 15 '18 07:01 zhijiew

@littlebelly thank you for your reply. I changed the codes and remove the slice operation. But I still meet a problem in the model. error

Have you meet this error before?

Jan 15 '18 09:01 lzj322

Maybe trying to revert to this commit may help solve all the problems. After that try to make the code compatible with python3 version.

Jan 15 '18 09:01 iliadsouti

Hi everyone, it seems there are some problems with the VOC dataset loader. I will check the code and fix bugs later on (I am busy for other things currently :-( ).

Jan 15 '18 10:01 zijundeng

@iliadsouti It really works, thank you very much.

Jan 15 '18 15:01 lzj322

+1 confirmed that @iliadsouti 's suggestion of reverting to that commit averts the issue with the input to batch norm for PSPNet. If I have time I'll try to PR a fix to current master

Jan 31 '18 02:01 peteflorence

pytorch-semantic-segmentation
pytorch-semantic-segmentation copied to clipboard

Crashing intermittently when training psp_net with VOC dataset.

pytorch-semantic-segmentation pytorch-semantic-segmentation copied to clipboard

Crashing intermittently when training psp_net with VOC dataset.

pytorch-semantic-segmentation
pytorch-semantic-segmentation copied to clipboard