Deformable-ConvNets icon indicating copy to clipboard operation
Deformable-ConvNets copied to clipboard

Exception while resuming training: assert isinstance(step, list) and len(step) >= 1 AssertionError

Open pervaizniazi opened this issue 6 years ago • 1 comments

Hello, I need to resume training but getting following exception:

File "experiments/fpn/../../fpn/../lib/utils/lr_scheduler.py", line 29, in init assert isinstance(step, list) and len(step) >= 1 AssertionError

I have made following changes in .yaml file: begin_epoch: 76 end_epoch: 100

Any help will be much appreciated.

Thanks

pervaizniazi avatar Jan 31 '19 12:01 pervaizniazi

The issue begins in the lr_step field in the config file.

lr: 0.0005
lr_step: '4.83'
warmup: true
warmup_lr: 0.00005
# typically we will use 4000 warmup step for single GPU on VOC
warmup_step: 1000

In the call to get the learning rate scheduler:

    # decide learning rate
    base_lr = lr
    lr_factor = config.TRAIN.lr_factor
    lr_epoch = [float(epoch) for epoch in lr_step.split(',')]
    lr_epoch_diff = [epoch - begin_epoch for epoch in lr_epoch if epoch > begin_epoch]
    lr = base_lr * (lr_factor ** (len(lr_epoch) - len(lr_epoch_diff)))
    lr_iters = [int(epoch * len(roidb) / batch_size) for epoch in lr_epoch_diff]
    print('lr', lr, 'lr_epoch_diff', lr_epoch_diff, 'lr_iters', lr_iters)
    lr_scheduler = WarmupMultiFactorScheduler(lr_iters, lr_factor, config.TRAIN.warmup, 
    config.TRAIN.warmup_lr, config.TRAIN.warmup_step)

Note that steps in your error call is lr_iters, if you follow the logic here you will see that lr_epoch=[4.83] and this means the lr_epoch_diff = [epoch - begin_epoch for epoch in lr_epoch if epoch > begin_epoch] is an empty list because the if will never be satisfied if begin_epoch > lr_step.

I don't have a fix for this. I'd be happy for more action here, its a pretty serious flaw.

bfialkoff avatar May 16 '19 10:05 bfialkoff