pretrained-backbones-unet icon indicating copy to clipboard operation
pretrained-backbones-unet copied to clipboard

Missing argument error in trainer.py when setting lr_scheduler

Open bc-bytes opened this issue 1 year ago • 3 comments

In trainer.py (line 101) there is a missing argument. Here's how I define optimiser and lr_scheduler in train.py:

optimizer = torch.optim.Adam(params, lr=0.001, weight_decay=0.0001)
scheduler = lr_scheduler.ReduceLROnPlateau(optimizer, factor=0.1, patience=6, verbose=1, min_lr=0.000001)
trainer = Trainer(
    model,
    criterion=BCEDiceLoss(),
    optimizer=optimizer,
    lr_scheduler=scheduler,   
    epochs=500
)

That throws the following error:

File "/home/seg/backbones_unet/utils/trainer.py", line 127, in _train_one_epoch
    self.lr_scheduler.step() # this was originally: self.lr_scheduler.step()
TypeError: step() missing 1 required positional argument: 'metrics'

If I then set line 101 to self.lr_scheduler.step(loss) that seems to fix the error. However, when I start training I get this:

Training Model on 500 epochs:   0%|                            | 0/500 [00:00<?, ?it/s
Epoch 00032: reducing learning rate of group 0 to 1.0000e-04.  | 31/800 [00:03<00:54, 14.03 training-batch/s, loss=1.1]
Epoch 00054: reducing learning rate of group 0 to 1.0000e-05.  | 53/800 [00:04<00:51, 14.38 training-batch/s, loss=1.09]
Epoch 00062: reducing learning rate of group 0 to 1.0000e-06.  | 61/800 [00:05<00:54, 13.45 training-batch/s, loss=1.06]
^Epoch 1:  13%|██████████████████                              | 104/800 [00:08<00:58, 11.84 training-batch/s, loss=1.05]

I haven't seen that before when training models with code from other repos. If that is normal, then all is OK, I just wanted to report the missing argument error in trainer.py.

bc-bytes avatar Apr 27 '23 09:04 bc-bytes