mmcv [Fix] DDP duplicate logs handling

Motivation

Essentially, PyTorch DDP attaches StreamHandler directly after first forward() and by default logger.root.handlers may be empty list

Modification

Added manual append to root.handlers list of StreamHandler with level ERROR in case of empty list

Checklist

Before PR:

[x] I have read and followed the workflow indicated in the CONTRIBUTING.md to create this PR.
[x] Pre-commit or linting tools indicated in CONTRIBUTING.md are used to fix the potential lint issues.
[x] Bug fixes are covered by unit tests, the case that causes the bug should be added in the unit tests.
[x] New functionalities are covered by complete unit tests. If not, please add more unit test to ensure the correctness.
[x] The documentation has been modified accordingly, including docstring or example tutorials.

After PR:

[ ] If the modification has potential influence on downstream or other related projects, this PR should be tested with some of those projects, like MMDet or MMCls.
[x] CLA has been signed and all committers have signed the CLA in this PR.

Jan 22 '22 20:01 sleep3r

All committers have signed the CLA.

Jan 22 '22 20:01 CLAassistant

Hi @sleep3r ,thanks for your contribution. What is your PyTorch version?

Jan 23 '22 01:01 zhouzaida

@zhouzaida Tested with PyTorch 1.10.1

I tried your logging utils with my trainloop and figured out some strange logging behaviour, let me show you some example.

Jan 23 '22 11:01 sleep3r

Снимок экрана 2022-01-23 в 14 59 03

Jan 23 '22 11:01 sleep3r

Here you could see, that by default the logger has not any handlers and DDP attaches one after forward of batch with idx 1. Therefore, the following logs are duplicated

Jan 23 '22 12:01 sleep3r

I may be missing something in case of influence on downstream or other related projects, but it seems for me to be a good notice for someone with a possible similar problem

Jan 23 '22 12:01 sleep3r

Thank you very much for the description. BTW, is it convenient for you to test that torch<1.10.1 has similar behavior

Jan 23 '22 12:01 zhouzaida

@zhouzaida I've just tested. So, seems like this behaviour starts from torch==1.9.0.

1.8.0 and 1.8.1 are fine

Jan 23 '22 14:01 sleep3r

@zhouzaida I've just tested. So, seems like this behaviour starts from torch==1.9.0.

1.8.0 and 1.8.1 are fine

Got it.

Jan 23 '22 15:01 zhouzaida

LGTM. But it seems that the propagate attribute of Logger causes the duplicate log. Can set logger.propagate to False solve the problem of duplicate logs? This way we don't need the change attribute of the root logger any more。

Jan 26 '22 12:01 HAOCHENYE

@HAOCHENYE Well, yes logger.propagate=False solves this issue

Jan 26 '22 14:01 sleep3r

But this fix doesn't inherit the stated one in utils/logging.py ¯_(ツ)_/¯

Jan 26 '22 14:01 sleep3r

I mean, is it legit to replace all that StreamHandler setting stuff just by logger.propagate = False? Don't think so

Jan 26 '22 14:01 sleep3r

The init_process_group function in Pytorch 1.8 adds a StreamHandler to the root logger, so if get_logger is called after init_process_group, the log level of the root logger will be set to ERROR and there will be no duplicate log. Pytorch 1.10 and Pytorch1.7 do not attach a StreamHandler to the root logger during the init_process_group phase but during DDP.forward (Logging.info is called, which adds the default StreamHandler(INFO) to the root logger. Logging.info is also called in Pytorch1.8, while the log level of StreamHandler was set to ERROR by get_logger). The logging level of the StreamHandler at this point is INFO, so there will be a duplicate log caused by the propagate mechanism. Since the duplicate log is caused by the propagate attribute of the child logger, should we set the propagate to False? Here is the logger setting for Detectron2 https://github.com/facebookresearch/detectron2/blob/cbbc1ce26473cb2a5cc8f58e8ada9ae14cb41052/detectron2/utils/logger.py

Jan 26 '22 15:01 HAOCHENYE

LGTM. I‘ll do further testing in the downstream codebase.

Jan 27 '22 08:01 HAOCHENYE

Hi @sleep3r , is there any progress?

Feb 21 '22 03:02 teamwong111

Hi @sleep3r , unit tests failed. https://app.circleci.com/pipelines/github/open-mmlab/mmcv/582/workflows/fe9adf11-579c-45c9-8802-4a15f74bb9d9/jobs/1207?invite=true#step-110-124

May 12 '22 05:05 zhouzaida

@zhouzaida Yep, I mentioned this failed test above. But I haven't any thoughts how to fix it, unfortunately. Essentially, my new commit contains only some comment changes

May 12 '22 05:05 sleep3r

I could try the former fix without .propagate

May 12 '22 05:05 sleep3r

Lucky, it works

May 12 '22 06:05 sleep3r

@sleep3r thanks for your reply~. If we set default StreamHandler for logging.root in get_logger, It seems there will be no message shown in the terminal by calling logging.warning('xxx'):

import logging
sh = logging.StreamHandler()
sh.setLevel(logging.ERROR)
logging.root.addHandler(sh)

Is this behavior reasonable? Besides, what will happen if we set propagate=False, maybe there is some corner case I've not concerned.

May 17 '22 06:05 HAOCHENYE

@HAOCHENYE I can't reproduce the behaviour with warning logging. Setting propagate=False leads to fallen tests, and I don't have much about this question too.

Jul 02 '22 20:07 sleep3r

@HAOCHENYE I can't reproduce the behaviour with warning logging. Setting propagate=False leads to fallen tests, and I don't have much about this question too.

Hi~ sorry for my the reply. This is the original issue #1691 I think it could be better if we do not need to change the root logger

Jul 07 '22 02:07 HAOCHENYE

Hi @sleep3r！We are grateful for your efforts in helping improve this open-source project during your personal time. Welcome to join OpenMMLab Special Interest Group (SIG) private channel on Discord, where you can share your experiences, ideas, and build connections with like-minded peers. To join the SIG channel, simply message moderator— OpenMMLab on Discord or briefly share your open-source contributions in the #introductions channel and we will assist you. Look forward to seeing you there! Join us ：https://discord.gg/UjgXkPWNqA If you have a WeChat account，welcome to join our community on WeChat. You can add our assistant ：openmmlabwx. Please add "mmsig + Github ID" as a remark when adding friends：） Thank you again for your contribution❤ @sleep3r

Apr 12 '23 08:04 OpenMMLab-Assistant003

mmcv mmcv copied to clipboard

[Fix] DDP duplicate logs handling

Motivation

Modification

Checklist

mmcv
mmcv copied to clipboard