mmcv
mmcv copied to clipboard
Validation by iteration causes model.training=False to continue until the end of an epoch
See https://github.com/open-mmlab/mmaction2/issues/1781 Thanks @makecent for this issue.
@makecent, Sorry for the late reply. Actually, we do not suggest training with EpochBaseRunner, and validation by iteration. Suppose you really want to validate by iteration in EpochBasedRunner. In that case, you can implement a custom EvalHook and call model.train() at the end of the do_evaluate (You should make sure there is no other hook that requires the model under the eval state).
@HAOCHENYE Thanks for the reply. My case is that I want to validate every 0.5 epoch because my dataset size is huge (I think such demand is not that strange). I had considered using IterBasedRunner but that will bring lots of customization codes because mmaction2 only supports using EpochBasedRunenr in their apis.train.py.
A custom EvalHook could be a solution, but it still needs many custormization.
I still suggest you considering my PR. Despite it's quite simple, it's NOT only for solving the problem of validation by iteration using EpochBasedRunner, but to avoid pottential BUGs caused by the switched training state, which could possibly happens in any Hook of the type after_train_iter/before_train_iter.
If you really don't expect user using mmcv in that way. Could you please consider printing a warning, at least, about pottential BUGs may happens when validation by iteration is triggered in a EpochBasedRunner?
@makecent Hi~, Thanks for your suggestions. Considering that there may be some hooks after EvalHook that require the model to remain in eval state, therefore we do not call model.train in EvalHook.
I strongly agree with you that some exception warnings should be thrown, the current behavior is not friendly enough.
BTW, we have recently upgraded our training architecture and released a new repo, MMEngine. You can solve this problem by simply customizing a TrainLoop. Currently, MMEngine is in public beta, and some of the OpenMMLab repos(including MMAction2, still in progress) have been refactored based on MMEngine. Welcome to experience MMEngine, your valuable comments can help us to improve MMEngine.
LBNL, please feel free to use MMCV, and we will still maintain the master branch of it. All beneficial PRs will be merged into both MMCV and MMEngine.