mmcv icon indicating copy to clipboard operation
mmcv copied to clipboard

[Fix] Fix invalid access to training-state log while evaluation

Open mob5566 opened this issue 3 years ago • 5 comments
trafficstars

Motivation

Previously, the LoggerHook identified the mode of log_buffer by checking the 'time' attribute in the log_buffer of the runner, which is not reasonable and causes an issue (open-mmlab/mmsegmentaion#1502). Also, in #1252 and #1261, suggesting setting the priority of EvalHook to "LOW" was just a workaround and did not solve the root cause of this issue.

To solve this issue, it should identify the mode by a reliable state flag such that the LoggerHook can identify the correct mode of the log_buffer of the runner during training. Therefore, in this commit, I add a 'log_state' attribute while evaluating to help identify the correct mode.

Modification

  • Fix opem-mmlab/mmsegmentation#1502 (KeyError: 'data_time')

  • Add an attribute 'log_state' to log_buffer during evaluation

  • Identify the log_buffer mode by 'log_state' attribute instead of 'time' attribute

  • No need to log the 'log_state'

Checklist

Before PR:

  • [ ] I have read and followed the workflow indicated in the CONTRIBUTING.md to create this PR.
  • [ ] Pre-commit or linting tools indicated in CONTRIBUTING.md are used to fix the potential lint issues.
  • [ ] Bug fixes are covered by unit tests, the case that causes the bug should be added in the unit tests.
  • [ ] New functionalities are covered by complete unit tests. If not, please add more unit test to ensure the correctness.
  • [ ] The documentation has been modified accordingly, including docstring or example tutorials.

After PR:

  • [ ] If the modification has potential influence on downstream or other related projects, this PR should be tested with some of those projects, like MMDet or MMCls.
  • [ ] CLA has been signed and all committers have signed the CLA in this PR.

mob5566 avatar May 20 '22 12:05 mob5566

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar May 20 '22 12:05 CLAassistant

But should log_state reset to 'train' in 'after_train_iter' or 'after_train_epoch' ?

Sure! That's a good idea. I will submit an amendment ASAP. Thanks.

And, I've signed the CLA.

mob5566 avatar May 21 '22 16:05 mob5566

Hi @HAOCHENYE ,

Instead of using log_state, I added a property mode for LogBuffer to present the mode of the content of the log buffer. Before each iteration, the mode will be set to a specific mode according to its runner's mode. While evaluating the model by EvalHook, EvalHook will set the mode to 'val' for evaluation.

By the way, I found that test_logger did not use the parameter runner accordingly, and I fixed it with another commit.

mob5566 avatar May 22 '22 11:05 mob5566

Hi @HAOCHENYE ,

Instead of using log_state, I added a property mode for LogBuffer to present the mode of the content of the log buffer. Before each iteration, the mode will be set to a specific mode according to its runner's mode. While evaluating the model by EvalHook, EvalHook will set the mode to 'val' for evaluation.

By the way, I found that test_logger did not use the parameter runner accordingly, and I fixed it with another commit.

Hi~ This reminds me: why we don't use runner.mode ? In fact, the runner has the mode attribute, but it seems that it is not set correctly during training.

HAOCHENYE avatar May 23 '22 08:05 HAOCHENYE

Hi~ This reminds me: why we don't use runner.mode ? In fact, the runner has the mode attribute, but it seems that it is not set correctly during training.

I think both ways work. But, I think it's more like "evaluating the model during the training phase (mode) of a runner", so it's much clear to define the mode of the log_buffer. Moreover, I'm not sure if the mode of the runner changed, there would be any potential risk. Therefore, I decided to add a mode attribute for the log_buffer.

mob5566 avatar May 23 '22 10:05 mob5566