PyTorch-Encoding
PyTorch-Encoding copied to clipboard
Differences in Validation results when training in Pascal Context Dataset
Hi Hang Zhang!
First I want to thank you for the amazing repository.
I'm trying to train DeepLabv3 with ResNeSt-101 backbone (DeepLab_ResNeSt101_PContext) for the task of semantic segmentation in Pascal Context Dataset. I'm running the code without any issue, however, I'm still under you results from the pre-trained model that you provide in https://hangzhang.org/PyTorch-Encoding/model_zoo/segmentation.html :
Model | Pix Accuracy | MIoU |
---|---|---|
Mine | 79.1 % | 52.1 % |
Yours | 81.9 % | 56.5 % |
I'm using the exact same hyperparameters as you and using the following training command:
python train.py --dataset pcontext --model deeplab --aux --backbone resnest101
Is there something that I'm missing for reaching you results? I assume that your model is trained using Auxiliary Loss but not Semantic Encoding Loss. Are you using some pretraining data maybe?
Thanks in advance!
Alex.
Hi Alex,
Are you using batch size of 16? This is very important.
Did you test the pretrained model using this script?
https://hangzhang.org/PyTorch-Encoding/model_zoo/segmentation.html#test-pretrained
Hi!
Unfortunately, my GPU does not have enough memory to fit a batch size of 16, so I'm trying to simulate it by using gradient accumulation. I suppose that is the main problem, I was asking in case I have missed something else.
I do use your testing script (https://hangzhang.org/PyTorch-Encoding/model_zoo/segmentation.html#test-pretrained).
So I assume that the only problem is the batch size which is a problem with nearly no solution...
You may try PyTorch checkpoint
option, which saves memory usage.
Thanks for the suggestion. I tried it and although it saves GPU memory the performance of the final model is worse to the one train with lower batch size.
Can I ask you which GPU did you used to train the model, and how much memory did it have? I want to approximately know how many memory I'll need.
For the experiments in the paper, I used AWS EC2 P3.24dn instance with 8x 32GB V100 gpus, but may not be necessary. 16GB per gpu should be enough for most of the experiments.