mmcv
mmcv copied to clipboard
[Fix] DDP duplicate logs handling
Motivation
Essentially, PyTorch DDP attaches StreamHandler directly after first forward() and by default logger.root.handlers may be empty list
Modification
Added manual append to root.handlers list of StreamHandler with level ERROR in case of empty list
Checklist
Before PR:
- [x] I have read and followed the workflow indicated in the CONTRIBUTING.md to create this PR.
- [x] Pre-commit or linting tools indicated in CONTRIBUTING.md are used to fix the potential lint issues.
- [x] Bug fixes are covered by unit tests, the case that causes the bug should be added in the unit tests.
- [x] New functionalities are covered by complete unit tests. If not, please add more unit test to ensure the correctness.
- [x] The documentation has been modified accordingly, including docstring or example tutorials.
After PR:
- [ ] If the modification has potential influence on downstream or other related projects, this PR should be tested with some of those projects, like MMDet or MMCls.
- [x] CLA has been signed and all committers have signed the CLA in this PR.
Hi @sleep3r ,thanks for your contribution. What is your PyTorch version?
@zhouzaida Tested with PyTorch 1.10.1
I tried your logging utils with my trainloop and figured out some strange logging behaviour, let me show you some example.

Here you could see, that by default the logger has not any handlers and DDP attaches one after forward of batch with idx 1. Therefore, the following logs are duplicated
I may be missing something in case of influence on downstream or other related projects, but it seems for me to be a good notice for someone with a possible similar problem
Thank you very much for the description. BTW, is it convenient for you to test that torch<1.10.1 has similar behavior
@zhouzaida I've just tested. So, seems like this behaviour starts from torch==1.9.0.
1.8.0 and 1.8.1 are fine
@zhouzaida I've just tested. So, seems like this behaviour starts from torch==1.9.0.
1.8.0 and 1.8.1 are fine
Got it.
LGTM. But it seems that the propagate attribute of Logger causes the duplicate log. Can set logger.propagate
to False solve the problem of duplicate logs? This way we don't need the change attribute of the root logger any more。
@HAOCHENYE Well, yes logger.propagate=False solves this issue
But this fix doesn't inherit the stated one in utils/logging.py ¯_(ツ)_/¯
I mean, is it legit to replace all that StreamHandler setting stuff just by logger.propagate = False? Don't think so
The init_process_group
function in Pytorch 1.8 adds a StreamHandler
to the root logger, so if get_logger
is called after init_process_group
, the log level of the root logger will be set to ERROR
and there will be no duplicate log. Pytorch 1.10 and Pytorch1.7 do not attach a StreamHandler to the root logger during the init_process_group phase but during DDP.forward
(Logging.info
is called, which adds the default StreamHandler
(INFO
) to the root logger. Logging.info
is also called in Pytorch1.8, while the log level of StreamHandler
was set to ERROR
by get_logger
). The logging level of the StreamHandler
at this point is INFO
, so there will be a duplicate log caused by the propagate mechanism. Since the duplicate log is caused by the propagate attribute of the child logger, should we set the propagate to False? Here is the logger setting for Detectron2 https://github.com/facebookresearch/detectron2/blob/cbbc1ce26473cb2a5cc8f58e8ada9ae14cb41052/detectron2/utils/logger.py
LGTM. I‘ll do further testing in the downstream codebase.
Hi @sleep3r , is there any progress?
Hi @sleep3r , unit tests failed. https://app.circleci.com/pipelines/github/open-mmlab/mmcv/582/workflows/fe9adf11-579c-45c9-8802-4a15f74bb9d9/jobs/1207?invite=true#step-110-124
@zhouzaida Yep, I mentioned this failed test above. But I haven't any thoughts how to fix it, unfortunately. Essentially, my new commit contains only some comment changes
I could try the former fix without .propagate
Lucky, it works
@sleep3r thanks for your reply~.
If we set default StreamHandler for logging.root
in get_logger
, It seems there will be no message shown in the terminal by calling logging.warning('xxx'):
import logging
sh = logging.StreamHandler()
sh.setLevel(logging.ERROR)
logging.root.addHandler(sh)
Is this behavior reasonable? Besides, what will happen if we set propagate=False
, maybe there is some corner case I've not concerned.
@HAOCHENYE
I can't reproduce the behaviour with warning logging. Setting propagate=False
leads to fallen tests, and I don't have much about this question too.
@HAOCHENYE I can't reproduce the behaviour with warning logging. Setting
propagate=False
leads to fallen tests, and I don't have much about this question too.
@HAOCHENYE I can't reproduce the behaviour with warning logging. Setting
propagate=False
leads to fallen tests, and I don't have much about this question too.
Hi~ sorry for my the reply. This is the original issue #1691 I think it could be better if we do not need to change the root logger
Hi @sleep3r!We are grateful for your efforts in helping improve this open-source project during your personal time. Welcome to join OpenMMLab Special Interest Group (SIG) private channel on Discord, where you can share your experiences, ideas, and build connections with like-minded peers. To join the SIG channel, simply message moderator— OpenMMLab on Discord or briefly share your open-source contributions in the #introductions channel and we will assist you. Look forward to seeing you there! Join us :https://discord.gg/UjgXkPWNqA If you have a WeChat account,welcome to join our community on WeChat. You can add our assistant :openmmlabwx. Please add "mmsig + Github ID" as a remark when adding friends:) Thank you again for your contribution❤ @sleep3r