pytorch-semantic-segmentation PSPnet performance cityscapes

Hi! First of all, thank you very much for sharing this code. My doubt: I have trained PSPnet in cityscapes with your default configuration and I am getting ~0.63 mIU (fine set), which is far from the 0.78 reported in the paper. Could you give me any recommendation to approach to the paper performance? Has the batch_size = 8 (16 in the paper) any great impact in this situation?

Oct 13 '17 12:10 DiegoOrtego

Hello, I have trained PSPNet with default configuration. I got 80% mean IoU. However, this mean IoU seems to be category IoU not class IoU!

Oct 13 '17 17:10 shahabty

Could you tell me which is that configuration? I am using: args = { 'train_batch_size': 8, 'lr': 1e-2 / (math.sqrt(16. / 8)), 'lr_decay': 0.9, 'max_iter': 10e4, 'input_size': 340, 'weight_decay': 1e-4, 'momentum': 0.9, 'snapshot': '', # empty string denotes learning from scratch 'print_freq': 20, 'val_batch_size': 8, 'val_save_to_img_file': False, 'val_img_sample_rate': 0.1 # randomly sample some validation results to display }

Data augmentation:

train_simul_transform = simul_transforms.Compose([ simul_transforms.RandomSized(train_args['input_size']), simul_transforms.RandomRotate(10), simul_transforms.RandomHorizontallyFlip() ])

Oct 13 '17 17:10 DiegoOrtego

I got these results from train_coarse_extra.py Yes this is my configuration

Oct 13 '17 17:10 shahabty

Ok, I am using just train_fine.py but in the paper they report 0.78 of mIU. Thanks!

Oct 13 '17 17:10 DiegoOrtego

best record: [val loss 0.11416], [acc 0.97769], [acc_cls 0.86788], [mean_iu 0.80071], [fwavacc 0.95816], [epoch 36]

Oct 13 '17 18:10 shahabty

Could you copy here the training arguments and data augmentation that you are using?

Oct 13 '17 18:10 DiegoOrtego

args = { 'train_batch_size': 8, 'lr': 1e-2 / (math.sqrt(16. / 8)), 'lr_decay': 0.9, 'max_iter': 10e4, 'input_size': 340, 'weight_decay': 1e-4, 'momentum': 0.9, 'snapshot': '', # empty string denotes learning from scratch 'print_freq': 20, 'val_batch_size': 8, 'val_save_to_img_file': False, 'val_img_sample_rate': 0.1 # randomly sample some validation results to display } train_simul_transform = simul_transforms.Compose([ simul_transforms.RandomSized(train_args['input_size']), simul_transforms.RandomRotate(10), simul_transforms.RandomHorizontallyFlip() ]) val_simul_transform = simul_transforms.Scale(train_args['input_size']) train_input_transform = standard_transforms.Compose([ standard_transforms.ToTensor(), standard_transforms.Normalize(*mean_std) ])

Oct 13 '17 18:10 shahabty

Ok, thanks! Any suggestion to improve performance using the fine annotated cityscapes is welcomed! I want to avoid using coarse annotations

Oct 13 '17 18:10 DiegoOrtego

@shahabty @DiegoOrtego Hi, I am also trying to reproduce the result, but with ResNet50.

Thanks to your shared parameters, I will run the algorithm with the following parameters. I have an extra question regarding @shahabty mentioning "this mean IoU seems to be category IoU not class IoU". According to utils/misc.py, the calculated IoU is over the number of classes defined in datasets/cityscapes.py, which is 19.

Am I correct? or did you find anything odd in the code?

args = { 'train_batch_size': 16, 'lr': 1e-2, 'lr_decay': 0.9, 'max_iter': 9e4, # the paper says 90K for Cityscape, so.. 'input_size': 340, 'weight_decay': 1e-4, 'momentum': 0.9, 'snapshot': '', # empty string denotes learning from scratch 'print_freq': 20, 'val_batch_size': 16, 'val_save_to_img_file': False, 'val_img_sample_rate': 0.1 # randomly sample some validation results to display }

Oct 15 '17 10:10 Jongchan

Hi, I did not find anything odd in the code. I am retraining keeping the number of iterations but not the batch size (as my GPU is limiting that). And just with the fine annotated data. This is what I am getting:

[epoch 122], [val loss 0.22252], [acc 0.93157], [acc_cls 0.70841], [mean_iu 0.62067], [fwavacc 0.87792] best record: [val loss 0.22191], [acc 0.93022], [acc_cls 0.72752], [mean_iu 0.63397], [fwavacc 0.87701], [epoch 91]

Are you training also with coarse annotations?

Oct 15 '17 10:10 DiegoOrtego

Regarding the learning rate adaptation depending on the batch size I found this: Quoting from "One weird trick for parallelizing convolutional neural networks" by Alex Krizhevsky: Theory suggests that when multiplying the batch size by k, one should multiply the learning rate by sqrt(k) to keep the variance in the gradient expectation constant.

So that is why 0.01 of lr is multiplied by sqrt(16/8), I guess.

Oct 15 '17 10:10 DiegoOrtego

I am also training with fine dataset only! I am running with 4 Titan Black, so I am lucky to keep the batch size as 16.

I think I can share my result... in 2 days. According to my calculation, it will take 50 hours to train. Also, I am running with some of my own ideas upon ResNet50.

Oct 15 '17 10:10 Jongchan

Great! I am using ResNet101 for PSPnet. Good luck!

Oct 15 '17 11:10 DiegoOrtego

It's nice that my code can help you all.

The PSPNet paper says that the model is firstly trained on the coarse dataset and then finetuned on the fine dataset. It's easy to get high mIOU (0.8+) on the coarse dataset. But I failed to reproduce the performance mentioned in the paper (I only got 0.6+ mIOU on the fine cityscapes validation set).

The PSPNet author uses a multiGPU-synchonized version of BN which is more accurate and hence beneficial to the performance. However, current version of BN in PyTorch cannot support multiGPU-synchonizing. Someone has put forward an issue about that. Please tell me if you have any other tricks that can help to improve the performace. Thanks.

Looking forward to the experiment result of @Jongchan .

Oct 15 '17 13:10 zijundeng

Thank you for your nice code :D @ZijunDeng.

PS. Because I have to run two experiments at the same time, the batch size will be reduced to 8.. I will report when it is finished~

Oct 15 '17 13:10 Jongchan

This issue may be of interest: https://github.com/Vladkryvoruchko/PSPNet-Keras-tensorflow/issues/12

Oct 15 '17 14:10 DiegoOrtego

@Jongchan You are right. I went through the code and everything is fine with the code. Since, this code is not parallelized @ZijunDeng , I wasn't sure about the IoU.

Oct 15 '17 17:10 shahabty

@DiegoOrtego It is helpful! Thanks. I am going to try the sliced prediction mentioned in https://github.com/Vladkryvoruchko/PSPNet-Keras-tensorflow/issues/12

Oct 16 '17 09:10 zijundeng

@ZijunDeng Great! Tell us if you are able to improve performance! Good luck!

Oct 16 '17 10:10 DiegoOrtego

@DiegoOrtego Sure

Oct 16 '17 10:10 zijundeng

Does this link helps you? http://hangzh.com/PyTorch-Encoding/syncbn.html

Someone (fmassa) also proposed this, 'What you could do is to fix the batch-norm statistics in this case, or even (and it is what most people do) replace entirely batch-norm with a fixed affine transformation.'

Oct 18 '17 16:10 aymenx17

@aymenx17 The link is helpful! Thank you. And I plans to try the trick of freezing bn.

Oct 19 '17 01:10 zijundeng

Has anyone tried using the link provided by @aymenx17 i.e http://hangzh.com/PyTorch-Encoding/syncbn.html ? If so, was anyone able to reproduce the accuracies provided in the paper? TIA

Jan 04 '18 11:01 rohitgajawada

where did you find train_coarse_extra.py ? @shahabty

Apr 08 '18 19:04 IssamLaradji

@IssamLaradji It used to be in this repo. I couldn't find it now.

Apr 10 '18 13:04 shahabty

fyi SyncBatchNorm added now in PyTorch master via https://github.com/pytorch/pytorch/pull/14267 For documentation, see: https://pytorch.org/docs/master/nn.html#torch.nn.SyncBatchNorm

Apr 07 '19 05:04 soumith

@soumith Good news! Thanks for reminding.

Apr 07 '19 10:04 zijundeng

when multiplying the batch size by k, one should multiply the learning rate by sqrt(k) to keep the variance in the gradient expectation constant

@DiegoOrtego can you elaborate on this?

Nov 25 '19 12:11 mrgloom

pytorch-semantic-segmentation pytorch-semantic-segmentation copied to clipboard

PSPnet performance cityscapes

pytorch-semantic-segmentation
pytorch-semantic-segmentation copied to clipboard