PyTorch-Encoding icon indicating copy to clipboard operation
PyTorch-Encoding copied to clipboard

Differences in Validation results when training in Pascal Context Dataset

Open alexlopezcifuentes opened this issue 4 years ago • 6 comments

Hi Hang Zhang!

First I want to thank you for the amazing repository.

I'm trying to train DeepLabv3 with ResNeSt-101 backbone (DeepLab_ResNeSt101_PContext) for the task of semantic segmentation in Pascal Context Dataset. I'm running the code without any issue, however, I'm still under you results from the pre-trained model that you provide in https://hangzhang.org/PyTorch-Encoding/model_zoo/segmentation.html :

Model Pix Accuracy MIoU
Mine 79.1 % 52.1 %
Yours 81.9 % 56.5 %

I'm using the exact same hyperparameters as you and using the following training command: python train.py --dataset pcontext --model deeplab --aux --backbone resnest101

Is there something that I'm missing for reaching you results? I assume that your model is trained using Auxiliary Loss but not Semantic Encoding Loss. Are you using some pretraining data maybe?

Thanks in advance!

Alex.

alexlopezcifuentes avatar Nov 06 '20 09:11 alexlopezcifuentes

Hi Alex,

Are you using batch size of 16? This is very important.

zhanghang1989 avatar Nov 07 '20 03:11 zhanghang1989

Did you test the pretrained model using this script?

https://hangzhang.org/PyTorch-Encoding/model_zoo/segmentation.html#test-pretrained

zhanghang1989 avatar Nov 07 '20 03:11 zhanghang1989

Hi!

Unfortunately, my GPU does not have enough memory to fit a batch size of 16, so I'm trying to simulate it by using gradient accumulation. I suppose that is the main problem, I was asking in case I have missed something else.

I do use your testing script (https://hangzhang.org/PyTorch-Encoding/model_zoo/segmentation.html#test-pretrained).

So I assume that the only problem is the batch size which is a problem with nearly no solution...

alexlopezcifuentes avatar Nov 09 '20 11:11 alexlopezcifuentes

You may try PyTorch checkpoint option, which saves memory usage.

zhanghang1989 avatar Nov 10 '20 05:11 zhanghang1989

Thanks for the suggestion. I tried it and although it saves GPU memory the performance of the final model is worse to the one train with lower batch size.

Can I ask you which GPU did you used to train the model, and how much memory did it have? I want to approximately know how many memory I'll need.

alexlopezcifuentes avatar Nov 17 '20 11:11 alexlopezcifuentes

For the experiments in the paper, I used AWS EC2 P3.24dn instance with 8x 32GB V100 gpus, but may not be necessary. 16GB per gpu should be enough for most of the experiments.

zhanghang1989 avatar Nov 17 '20 18:11 zhanghang1989