mmengine
mmengine copied to clipboard
[Bug] Crash after val epoch with ReduceLROnPlateau
Prerequisite
- [X] I have searched Issues and Discussions but cannot get the expected help.
- [X] The bug has not been fixed in the latest version(https://github.com/open-mmlab/mmengine).
Environment
I'm training mmocr which has no loss calculation in the validation epoch. Thus, I end up with an error:
https://github.com/open-mmlab/mmengine/blob/main/mmengine/optim/scheduler/param_scheduler.py#L1488
File "/home/rdl/.local/lib/python3.8/site-packages/mmengine/hooks/param_scheduler_hook.py", line 120, in after_val_epoch
step(runner.param_schedulers)
File "/home/rdl/.local/lib/python3.8/site-packages/mmengine/hooks/param_scheduler_hook.py", line 117, in step
scheduler.step(metrics)
File "/home/rdl/.local/lib/python3.8/site-packages/mmengine/optim/scheduler/param_scheduler.py", line 1488, in step
raise KeyError(f'Excepted key in {list(metrics.keys())},'
I think it's worth revisiting this error and adding a new variable that will be responsible for the need for this error.
In my understanding - there were no values - there is no calculation.
The same thing, if you set to follow the validation (and not for loss), errors fall out.
Reproduces the problem - code sample
dict(
type='ReduceOnPlateauLR',
monitor='loss',
patience=5,
factor=0.5,
begin=int(EPOCH_COUNT * 0.1),
),
Reproduces the problem - command or script
python3 train.py any_config_with_sheduler
Reproduces the problem - error message
Traceback (most recent call last):
File "./train.py", line 123, in <module>
main()
File "./train.py", line 119, in main
runner.train()
File "/home/rdl/.local/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1706, in train
model = self.train_loop.run() # type: ignore
File "/home/rdl/.local/lib/python3.8/site-packages/mmengine/runner/loops.py", line 102, in run
self.runner.val_loop.run()
File "/home/rdl/.local/lib/python3.8/site-packages/mmengine/runner/loops.py", line 367, in run
self.runner.call_hook('after_val_epoch', metrics=metrics)
File "/home/rdl/.local/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1768, in call_hook
getattr(hook, fn_name)(self, **kwargs)
File "/home/rdl/.local/lib/python3.8/site-packages/mmengine/hooks/param_scheduler_hook.py", line 120, in after_val_epoch
step(runner.param_schedulers)
File "/home/rdl/.local/lib/python3.8/site-packages/mmengine/hooks/param_scheduler_hook.py", line 117, in step
scheduler.step(metrics)
File "/home/rdl/.local/lib/python3.8/site-packages/mmengine/optim/scheduler/param_scheduler.py", line 1488, in step
raise KeyError(f'Excepted key in {list(metrics.keys())},'
KeyError: "Excepted key in ['CocoOCRDataset/recog/word_acc', 'CocoOCRDataset/recog/word_acc_ignore_case', 'CocoOCRDataset/recog/word_acc_ignore_case_symbol', 'CocoOCRDataset/recog/char_recall', 'CocoOCRDataset/recog/char_precision', 'CocoOCRDataset/recog/1-N.E.D'], but got key loss is not in dict"
Additional information
No response
Hi @MiXaiLL76 , thanks for your feedback. Could you help us refine the error message?
same question, key loss not in dict
Hi @MiXaiLL76 , thanks for your feedback. Could you help us refine the error message?
Maybe it's worth here (https://github.com/open-mmlab/mmengine/blob/main/mmengine/optim/scheduler/param_scheduler.py#L1505), not using raise, but calling warning? Because metrics (if it's not loss) are calculated once per epoch (during validation). And the scheduler itself can be called outside of validation.