pytorch-semantic-segmentation
pytorch-semantic-segmentation copied to clipboard
Crashing intermittently when training psp_net with VOC dataset.
I have been getting this error consistently before it manages complete a single epoch:
[epoch 1], [iter 630 / 8498], [train main loss 1.32866], [train aux loss 1.31173]. [lr 0.0049055503]
Traceback (most recent call last):
File "train.py", line 252, in
I met the same problem. It seems that the data augmentation codes don't work well. Can you check it? @ZijunDeng
@andreasrobinson @lzj322 Have you fix this problems? I also met the same problem
I find it!
This error is caused by some training data, maybe there are some error when preprocess data, so I just delete them and my training code can run.
just delete these lines in train.txt: 724, 1237, 3572, 3920, 4688, 7031,
@littlebelly , can you get a successful training result? Do you change any of author's codes?
I can train the network after delete training data I mentioned above, but there are still some errors in validate process, so I just give it up and use caffe code provided by offical author.
I feel that these errors caused by image slice operation which is needed in cityscape dataset because of the large image size, but unnecessary in voc2012 dataset.
I hope @ZijunDeng can help to solve these errors~
@littlebelly
thank you for your reply. I changed the codes and remove the slice operation. But I still meet a problem in the model.

Have you meet this error before?
Maybe trying to revert to this commit may help solve all the problems. After that try to make the code compatible with python3 version.
Hi everyone, it seems there are some problems with the VOC dataset loader. I will check the code and fix bugs later on (I am busy for other things currently :-( ).
@iliadsouti It really works, thank you very much.
+1 confirmed that @iliadsouti 's suggestion of reverting to that commit averts the issue with the input to batch norm for PSPNet. If I have time I'll try to PR a fix to current master