pytorch-semseg icon indicating copy to clipboard operation
pytorch-semseg copied to clipboard

loss increase a lot when training on pspnet

Open sanweiliti opened this issue 6 years ago • 5 comments

Hi, I got validation result of ~ 78% for mIoU on cityscapes with pspnet model, but when I try to finetune this model on the training set of cityscapes, after I did one back propagation, the training loss and validation loss got crazily high, and the mIoU drops a lot, anyone know why? Does this has anything to do with the batch normalization?

sanweiliti avatar Nov 25 '18 17:11 sanweiliti

Could you share your training settings (i.e., optimizer, learning rate, image size, ... in config file)?

adam9500370 avatar Nov 26 '18 04:11 adam9500370

Hi, I'm using the following config:

model:
    arch: pspnet
    version: cityscapes

data:
    dataset: cityscapes
    train_split: train
    val_split: val
    test_split: test
    img_rows: 257
    img_cols: 513
    img_norm: False
    path: ./datasets/cityscapes
    version: pascal # pascal mean for pspNet

training:
    train_iters: 1000
    batch_size: 2
    val_interval: 5
    n_workers: 2
    print_interval: 1
    optimizer:
        name: 'adam'
        lr: 1.0e-4
    loss:
        name: 'multi_scale_cross_entropy'
        size_average: True
    lr_schedule:
    resume:  

And I load the trained weights via load_pretrained_model() function, which is okay for validation. Due to the resolution, this config can only reach ~61% mIoU for validation, but after training for one iteration, the mIoU will drop to 40%, and can not get back to 61% anymore. I just used the nomral training procedure in train.py, nothing special.

sanweiliti avatar Nov 26 '18 10:11 sanweiliti

It happens also to me when I try to train with resized images. +1

Edit: also, I'm training with batchsize 8, so I suppose there is a problem with the training procedure.

fabvio avatar Dec 04 '18 10:12 fabvio

Did you solve this problem? I am facing the same problem.

w777kk avatar Jun 18 '19 09:06 w777kk

No, I had to change training routine. I suppose that some of the strategies implemented in this repo simply don't work with huge architectures like pspnet.

fabvio avatar Jun 18 '19 09:06 fabvio