pytorch-deeplab-xception
pytorch-deeplab-xception copied to clipboard
reproduced result is low on ResNet backbone
I tried to reproduce the result(78.43%) of ResNet backbone in README.MD just by using this command.
Of course, I prepared for SBD dataset and train the model on VOC2012 + SBD.
Then, my result (75.06) is lower than the reported one. How can I get the reported result?
Is the reported result(78.43%) fine-tuned result after coco pre-training?
By the way, weird things were shown in the Tensorboard. while train loss converged, validation loss increased.
I got the same curves - Did you find an explanation ? May the pytorch version make a difference ?
I also obtained a lower result. MIOU 74.58%
I obtained a similar low result too. mIOU 74.15%. Any solution?
@lightas I contacted the owner and he said that he froze batch norm during training.
Be careful however that the --freeze-bn parameter doesn't work currently in the code !
Since I am interested in Cityscapes, I focused my time on it:
- I tried with frozen batch-norm (w/ cityscapes) but it didn't help.
- I obtained some correct results (74%) (w/ cityscapes) by lowering the LR at a faster rate using the step scheduler. So maybe you could try the step scheduler ?
- I am currently training again (w/ cityscapes) with the poly scheduler but changing the
lr_power
(in fileutils/lr_scheduler.py
):
I'm tryingself.lr * pow((1 - 1.0 * T / self.N), lr_power)
lr_power=3
and it is giving better results in my case, by lowering the learning rate faster.
Note :
- Train on multiple GPUs, since a large enough batch-size is very important. (I used 4 GPU with bs 16 : 4 on each gpu).
- Train with a small crop size : this allows to process more images per batch and stabilize the training. (I used 550x550)
- Remove image-level features in the ASPP module of DeepLab. (Not a big deal though)
I haven't retried on Pascal VOC yet, so I don't know if this makes a difference.
Also note that the new Torchvision v0.3 has deeplabv3 "built-in".
@theevann Thank you so much. I will try it.
@theevann Hi, would you please tell me why you said that the --freeze-bn parameter doesn't work? I didn't find out why it doesn't work.
@lightas The freeze-bn parameter puts BatchNorn into eval mode at model initialization. But then in the training you do model.train(), which sets the BatchNorm back to training mode...
he froze batch norm during training.
So the synchronized batch norm does not work in the code? Since sync bn should improve the performance, rather than decrease it
@theevann I got it. Thank you so much.
@youngwanLEE Hi, how to generate those curves? Thank you!
def training(self, epoch):
train_loss = 0.0
self.model.train()
for m in self.model.modules():
if isinstance(m, SynchronizedBatchNorm2d):
m.eval()
elif isinstance(m, nn.BatchNorm2d):
m.eval()
tbar = tqdm(self.train_loader)
or
def training(self, epoch):
train_loss = 0.0
self.model.train()
if self.args.freeze_bn:
for m in self.model.modules():
if isinstance(m, SynchronizedBatchNorm2d):
m.eval()
elif isinstance(m, nn.BatchNorm2d):
m.eval()
tbar = tqdm(self.train_loader)
num_img_tr = len(self.train_loader)
def make_data_loader(args, **kwargs):
if args.dataset == 'pascal':
train_set = pascal.VOCSegmentation(args, split='train')
val_set = pascal.VOCSegmentation(args, split='val')
if args.use_sbd:
sbd_train = sbd.SBDSegmentation(args, split=['train', 'val'])
train_set = combine_dbs.CombineDBs([train_set, sbd_train], excluded=[val_set])
num_class = train_set.NUM_CLASSES
num_class = (you.nums+1)
@youngwanLEE , what is you training parameter, the max mIoU is 0.61, I'm confused, thank you
@lightas I contacted the owner and he said that he froze batch norm during training. Be careful however that the --freeze-bn parameter doesn't work currently in the code !
Since I am interested in Cityscapes, I focused my time on it:
I tried with frozen batch-norm (w/ cityscapes) but it didn't help.
I obtained some correct results (74%) (w/ cityscapes) by lowering the LR at a faster rate using the step scheduler. So maybe you could try the step scheduler ?
I am currently training again (w/ cityscapes) with the poly scheduler but changing the
lr_power
(in fileutils/lr_scheduler.py
):self.lr * pow((1 - 1.0 * T / self.N), lr_power)
I'm trying
lr_power=3
and it is giving better results in my case, by lowering the learning rate faster.Note :
- Train on multiple GPUs, since a large enough batch-size is very important. (I used 4 GPU with bs 16 : 4 on each gpu).
- Train with a small crop size : this allows to process more images per batch and stabilize the training. (I used 550x550)
- Remove image-level features in the ASPP module of DeepLab. (Not a big deal though)
I haven't retried on Pascal VOC yet, so I don't know if this makes a difference.
Also note that the new Torchvision v0.3 has deeplabv3 "built-in".
@theevann Thanks a lot for sharing the useful modification. And I find the DeepLabv3+ paper didn't give the mIOU in on the resnet101 backbone, only Xception in the paper. And for cityscape, the official repo's zoo only add Xception and MobileNetv2 backbone. So how do you get the right mIOU range for DeepLabv3+ with Resnet101 as the backbone? Also, could you tell us some more detail info about your experiments in CityScapes? Did you use the additional coarse data? And did you set the output stride =8 or 16?
Thanks a lot!
Same problem. I got 74.**% training on VOC2012 + SBD, and using Resnet. I can not get 78%. Seems that this code base is not good for reproduction.
what is your train config, the acc is 61%,I train
| | 王涛 邮箱:[email protected] |
签名由 网易邮箱大师 定制
On 10/28/2019 10:39, Guolei Sun wrote:
Same problem. I got 74.**% training on VOC2012 + SBD, and using Resnet. I can not get 78%. Seems that this code base is not good for reproduction.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
I use "train_voc.py". Actually, I think my problem is exactly as the author of this issue. But the issue seems not solved.
Basically, what I want to do is to reproduce "78.43%" using "train_voc.py". Of course, I prepared for SBD dataset and train the model on VOC2012 + SBD. But I only got "74%", which is much lower than what I expect
Hi guys, I will share some of my experiment settings.
Dataset: Cityscape without coarse additional data Backbone: Resnet101 output_stride: 16 initial_learning_rate: 0.005 learning_decay: ploy training_epochs: 240 batch_size: 8 num_gpus: 4 train_size: 768*768 no sync_bn others: the default value of the train.py
Inference in 2048*1024 in val dataset of cityscapes, I got ~74% mIOU. I am not sure if it's a right mIOU
Hi guys,
For those who want to reproduce results on deeplab V3, I recommend this code: https://github.com/chenxi116/DeepLabv3.pytorch The code can simply reproduce 76.8% mIOU in Pascal val (trained on VOC2012 + SBD).
I also obtained a lower result. MIOU 74.58%
can I have your qq, i have a so poor performace on my owndataset and trained via 8 gpus, thank u very much .
Hi guys, I will share some of my experiment settings.
Dataset: Cityscape without coarse additional data Backbone: Resnet101 output_stride: 16 initial_learning_rate: 0.005 learning_decay: ploy training_epochs: 240 batch_size: 8 num_gpus: 4 train_size: 768*768 no sync_bn others: the default value of the train.py
Inference in 2048*1024 in val dataset of cityscapes, I got ~74% mIOU. I am not sure if it's a right mIOU
Can I have your qq number, I really want to get your help. I trained on my dataset with 8 gpus and got so poor performance..