mmcv
mmcv copied to clipboard
[Fix] Fix invalid access to training-state log while evaluation
Motivation
Previously, the LoggerHook identified the mode of log_buffer by checking the 'time' attribute in the log_buffer of the runner, which is not reasonable and causes an issue (open-mmlab/mmsegmentaion#1502). Also, in #1252 and #1261, suggesting setting the priority of EvalHook to "LOW" was just a workaround and did not solve the root cause of this issue.
To solve this issue, it should identify the mode by a reliable state flag such that the LoggerHook can identify the correct mode of the log_buffer of the runner during training. Therefore, in this commit, I add a 'log_state' attribute while evaluating to help identify the correct mode.
Modification
-
Fix opem-mmlab/mmsegmentation#1502 (KeyError: 'data_time')
-
Add an attribute 'log_state' to log_buffer during evaluation
-
Identify the log_buffer mode by 'log_state' attribute instead of 'time' attribute
-
No need to log the 'log_state'
Checklist
Before PR:
- [ ] I have read and followed the workflow indicated in the CONTRIBUTING.md to create this PR.
- [ ] Pre-commit or linting tools indicated in CONTRIBUTING.md are used to fix the potential lint issues.
- [ ] Bug fixes are covered by unit tests, the case that causes the bug should be added in the unit tests.
- [ ] New functionalities are covered by complete unit tests. If not, please add more unit test to ensure the correctness.
- [ ] The documentation has been modified accordingly, including docstring or example tutorials.
After PR:
- [ ] If the modification has potential influence on downstream or other related projects, this PR should be tested with some of those projects, like MMDet or MMCls.
- [ ] CLA has been signed and all committers have signed the CLA in this PR.
But should
log_statereset to 'train' in 'after_train_iter' or 'after_train_epoch' ?
Sure! That's a good idea. I will submit an amendment ASAP. Thanks.
And, I've signed the CLA.
Hi @HAOCHENYE ,
Instead of using log_state, I added a property mode for LogBuffer to present the mode of the content of the log buffer.
Before each iteration, the mode will be set to a specific mode according to its runner's mode.
While evaluating the model by EvalHook, EvalHook will set the mode to 'val' for evaluation.
By the way, I found that test_logger did not use the parameter runner accordingly, and I fixed it with another commit.
Hi @HAOCHENYE ,
Instead of using
log_state, I added a propertymodeforLogBufferto present the mode of the content of the log buffer. Before each iteration, the mode will be set to a specific mode according to its runner's mode. While evaluating the model by EvalHook, EvalHook will set the mode to'val'for evaluation.By the way, I found that
test_loggerdid not use the parameterrunneraccordingly, and I fixed it with another commit.
Hi~ This reminds me: why we don't use runner.mode ? In fact, the runner has the mode attribute, but it seems that it is not set correctly during training.
Hi~ This reminds me: why we don't use
runner.mode? In fact, the runner has the mode attribute, but it seems that it is not set correctly during training.
I think both ways work. But, I think it's more like "evaluating the model during the training phase (mode) of a runner", so it's much clear to define the mode of the log_buffer. Moreover, I'm not sure if the mode of the runner changed, there would be any potential risk. Therefore, I decided to add a mode attribute for the log_buffer.