pytorch-semantic-segmentation icon indicating copy to clipboard operation
pytorch-semantic-segmentation copied to clipboard

PSPnet performance cityscapes

Open DiegoOrtego opened this issue 8 years ago • 28 comments

Hi! First of all, thank you very much for sharing this code. My doubt: I have trained PSPnet in cityscapes with your default configuration and I am getting ~0.63 mIU (fine set), which is far from the 0.78 reported in the paper. Could you give me any recommendation to approach to the paper performance? Has the batch_size = 8 (16 in the paper) any great impact in this situation?

DiegoOrtego avatar Oct 13 '17 12:10 DiegoOrtego

Hello, I have trained PSPNet with default configuration. I got 80% mean IoU. However, this mean IoU seems to be category IoU not class IoU!

shahabty avatar Oct 13 '17 17:10 shahabty

Could you tell me which is that configuration? I am using: args = { 'train_batch_size': 8, 'lr': 1e-2 / (math.sqrt(16. / 8)), 'lr_decay': 0.9, 'max_iter': 10e4, 'input_size': 340, 'weight_decay': 1e-4, 'momentum': 0.9, 'snapshot': '', # empty string denotes learning from scratch 'print_freq': 20, 'val_batch_size': 8, 'val_save_to_img_file': False, 'val_img_sample_rate': 0.1 # randomly sample some validation results to display }

Data augmentation:

train_simul_transform = simul_transforms.Compose([ simul_transforms.RandomSized(train_args['input_size']), simul_transforms.RandomRotate(10), simul_transforms.RandomHorizontallyFlip() ])

DiegoOrtego avatar Oct 13 '17 17:10 DiegoOrtego

I got these results from train_coarse_extra.py Yes this is my configuration

shahabty avatar Oct 13 '17 17:10 shahabty

Ok, I am using just train_fine.py but in the paper they report 0.78 of mIU. Thanks!

DiegoOrtego avatar Oct 13 '17 17:10 DiegoOrtego

best record: [val loss 0.11416], [acc 0.97769], [acc_cls 0.86788], [mean_iu 0.80071], [fwavacc 0.95816], [epoch 36]

shahabty avatar Oct 13 '17 18:10 shahabty

Could you copy here the training arguments and data augmentation that you are using?

DiegoOrtego avatar Oct 13 '17 18:10 DiegoOrtego

args = { 'train_batch_size': 8, 'lr': 1e-2 / (math.sqrt(16. / 8)), 'lr_decay': 0.9, 'max_iter': 10e4, 'input_size': 340, 'weight_decay': 1e-4, 'momentum': 0.9, 'snapshot': '', # empty string denotes learning from scratch 'print_freq': 20, 'val_batch_size': 8, 'val_save_to_img_file': False, 'val_img_sample_rate': 0.1 # randomly sample some validation results to display } train_simul_transform = simul_transforms.Compose([ simul_transforms.RandomSized(train_args['input_size']), simul_transforms.RandomRotate(10), simul_transforms.RandomHorizontallyFlip() ]) val_simul_transform = simul_transforms.Scale(train_args['input_size']) train_input_transform = standard_transforms.Compose([ standard_transforms.ToTensor(), standard_transforms.Normalize(*mean_std) ])

shahabty avatar Oct 13 '17 18:10 shahabty

Ok, thanks! Any suggestion to improve performance using the fine annotated cityscapes is welcomed! I want to avoid using coarse annotations

DiegoOrtego avatar Oct 13 '17 18:10 DiegoOrtego

@shahabty @DiegoOrtego Hi, I am also trying to reproduce the result, but with ResNet50.

Thanks to your shared parameters, I will run the algorithm with the following parameters. I have an extra question regarding @shahabty mentioning "this mean IoU seems to be category IoU not class IoU". According to utils/misc.py, the calculated IoU is over the number of classes defined in datasets/cityscapes.py, which is 19.

Am I correct? or did you find anything odd in the code?

args = { 'train_batch_size': 16, 'lr': 1e-2, 'lr_decay': 0.9, 'max_iter': 9e4, # the paper says 90K for Cityscape, so.. 'input_size': 340, 'weight_decay': 1e-4, 'momentum': 0.9, 'snapshot': '', # empty string denotes learning from scratch 'print_freq': 20, 'val_batch_size': 16, 'val_save_to_img_file': False, 'val_img_sample_rate': 0.1 # randomly sample some validation results to display }

Jongchan avatar Oct 15 '17 10:10 Jongchan

Hi, I did not find anything odd in the code. I am retraining keeping the number of iterations but not the batch size (as my GPU is limiting that). And just with the fine annotated data. This is what I am getting:

[epoch 122], [val loss 0.22252], [acc 0.93157], [acc_cls 0.70841], [mean_iu 0.62067], [fwavacc 0.87792] best record: [val loss 0.22191], [acc 0.93022], [acc_cls 0.72752], [mean_iu 0.63397], [fwavacc 0.87701], [epoch 91]

Are you training also with coarse annotations?

DiegoOrtego avatar Oct 15 '17 10:10 DiegoOrtego

Regarding the learning rate adaptation depending on the batch size I found this: Quoting from "One weird trick for parallelizing convolutional neural networks" by Alex Krizhevsky: Theory suggests that when multiplying the batch size by k, one should multiply the learning rate by sqrt(k) to keep the variance in the gradient expectation constant.

So that is why 0.01 of lr is multiplied by sqrt(16/8), I guess.

DiegoOrtego avatar Oct 15 '17 10:10 DiegoOrtego

I am also training with fine dataset only! I am running with 4 Titan Black, so I am lucky to keep the batch size as 16.

I think I can share my result... in 2 days. According to my calculation, it will take 50 hours to train. Also, I am running with some of my own ideas upon ResNet50.

Jongchan avatar Oct 15 '17 10:10 Jongchan

Great! I am using ResNet101 for PSPnet. Good luck!

DiegoOrtego avatar Oct 15 '17 11:10 DiegoOrtego

It's nice that my code can help you all.

The PSPNet paper says that the model is firstly trained on the coarse dataset and then finetuned on the fine dataset. It's easy to get high mIOU (0.8+) on the coarse dataset. But I failed to reproduce the performance mentioned in the paper (I only got 0.6+ mIOU on the fine cityscapes validation set).

The PSPNet author uses a multiGPU-synchonized version of BN which is more accurate and hence beneficial to the performance. However, current version of BN in PyTorch cannot support multiGPU-synchonizing. Someone has put forward an issue about that. Please tell me if you have any other tricks that can help to improve the performace. Thanks.

Looking forward to the experiment result of @Jongchan .

zijundeng avatar Oct 15 '17 13:10 zijundeng

Thank you for your nice code :D @ZijunDeng.

PS. Because I have to run two experiments at the same time, the batch size will be reduced to 8.. I will report when it is finished~

Jongchan avatar Oct 15 '17 13:10 Jongchan

This issue may be of interest: https://github.com/Vladkryvoruchko/PSPNet-Keras-tensorflow/issues/12

DiegoOrtego avatar Oct 15 '17 14:10 DiegoOrtego

@Jongchan You are right. I went through the code and everything is fine with the code. Since, this code is not parallelized @ZijunDeng , I wasn't sure about the IoU.

shahabty avatar Oct 15 '17 17:10 shahabty

@DiegoOrtego It is helpful! Thanks. I am going to try the sliced prediction mentioned in https://github.com/Vladkryvoruchko/PSPNet-Keras-tensorflow/issues/12

zijundeng avatar Oct 16 '17 09:10 zijundeng

@ZijunDeng Great! Tell us if you are able to improve performance! Good luck!

DiegoOrtego avatar Oct 16 '17 10:10 DiegoOrtego

@DiegoOrtego Sure

zijundeng avatar Oct 16 '17 10:10 zijundeng

Does this link helps you? http://hangzh.com/PyTorch-Encoding/syncbn.html

Someone (fmassa) also proposed this, 'What you could do is to fix the batch-norm statistics in this case, or even (and it is what most people do) replace entirely batch-norm with a fixed affine transformation.'

aymenx17 avatar Oct 18 '17 16:10 aymenx17

@aymenx17 The link is helpful! Thank you. And I plans to try the trick of freezing bn.

zijundeng avatar Oct 19 '17 01:10 zijundeng

Has anyone tried using the link provided by @aymenx17 i.e http://hangzh.com/PyTorch-Encoding/syncbn.html ? If so, was anyone able to reproduce the accuracies provided in the paper? TIA

rohitgajawada avatar Jan 04 '18 11:01 rohitgajawada

where did you find train_coarse_extra.py ? @shahabty

IssamLaradji avatar Apr 08 '18 19:04 IssamLaradji

@IssamLaradji It used to be in this repo. I couldn't find it now.

shahabty avatar Apr 10 '18 13:04 shahabty

fyi SyncBatchNorm added now in PyTorch master via https://github.com/pytorch/pytorch/pull/14267 For documentation, see: https://pytorch.org/docs/master/nn.html#torch.nn.SyncBatchNorm

soumith avatar Apr 07 '19 05:04 soumith

@soumith Good news! Thanks for reminding.

zijundeng avatar Apr 07 '19 10:04 zijundeng

when multiplying the batch size by k, one should multiply the learning rate by sqrt(k) to keep the variance in the gradient expectation constant

@DiegoOrtego can you elaborate on this?

mrgloom avatar Nov 25 '19 12:11 mrgloom