PyTorch-Encoding Differences in Validation results when training in Pascal Context Dataset

Differences in Validation results when training in Pascal Context Dataset

Open alexlopezcifuentes opened this issue 4 years ago • 6 comments

Hi Hang Zhang!

First I want to thank you for the amazing repository.

I'm trying to train DeepLabv3 with ResNeSt-101 backbone (DeepLab_ResNeSt101_PContext) for the task of semantic segmentation in Pascal Context Dataset. I'm running the code without any issue, however, I'm still under you results from the pre-trained model that you provide in https://hangzhang.org/PyTorch-Encoding/model_zoo/segmentation.html :

Model	Pix Accuracy	MIoU
Mine	79.1 %	52.1 %
Yours	81.9 %	56.5 %

I'm using the exact same hyperparameters as you and using the following training command: python train.py --dataset pcontext --model deeplab --aux --backbone resnest101

Is there something that I'm missing for reaching you results? I assume that your model is trained using Auxiliary Loss but not Semantic Encoding Loss. Are you using some pretraining data maybe?

Thanks in advance!

Alex.

Nov 06 '20 09:11 alexlopezcifuentes

Hi Alex,

Are you using batch size of 16? This is very important.

Nov 07 '20 03:11 zhanghang1989

Did you test the pretrained model using this script?

https://hangzhang.org/PyTorch-Encoding/model_zoo/segmentation.html#test-pretrained

Nov 07 '20 03:11 zhanghang1989

Hi!

Unfortunately, my GPU does not have enough memory to fit a batch size of 16, so I'm trying to simulate it by using gradient accumulation. I suppose that is the main problem, I was asking in case I have missed something else.

I do use your testing script (https://hangzhang.org/PyTorch-Encoding/model_zoo/segmentation.html#test-pretrained).

So I assume that the only problem is the batch size which is a problem with nearly no solution...

Nov 09 '20 11:11 alexlopezcifuentes

You may try PyTorch checkpoint option, which saves memory usage.

Nov 10 '20 05:11 zhanghang1989

Thanks for the suggestion. I tried it and although it saves GPU memory the performance of the final model is worse to the one train with lower batch size.

Can I ask you which GPU did you used to train the model, and how much memory did it have? I want to approximately know how many memory I'll need.

Nov 17 '20 11:11 alexlopezcifuentes

For the experiments in the paper, I used AWS EC2 P3.24dn instance with 8x 32GB V100 gpus, but may not be necessary. 16GB per gpu should be enough for most of the experiments.

Nov 17 '20 18:11 zhanghang1989

PyTorch-Encoding PyTorch-Encoding copied to clipboard

Differences in Validation results when training in Pascal Context Dataset

PyTorch-Encoding
PyTorch-Encoding copied to clipboard