Improve logging in datasets?

Open pmeier opened this issue 3 years ago • 1 comments

Status Quo

Currently our datasets sometimes print diagnostic messages:

https://github.com/pytorch/vision/blob/657c0767c5ca5564c8b437ac44263994c8e01352/torchvision/datasets/caltech.py#L128

The common download utilities write to STDOUT

https://github.com/pytorch/vision/blob/657c0767c5ca5564c8b437ac44263994c8e01352/torchvision/datasets/utils.py#L156

and use tqdm which writes to STDERR:

https://github.com/pytorch/vision/blob/657c0767c5ca5564c8b437ac44263994c8e01352/torchvision/datasets/utils.py#L36

The latter has the option to also write to a different stream, but our fallback from torch.hub does not.

In some cases some information is also logged by our dependencies

https://github.com/pytorch/vision/blob/657c0767c5ca5564c8b437ac44263994c8e01352/torchvision/datasets/coco.py#L36

In any case, the user has no control over it whatsoever.

Proposal

Have a global or local setting for the stream we write to. For example

torchvision.datasets.logging_stream()

I would default it to sys.stdout, but no strong opinion. To silence everything, one could do

import os

torchvision.datasets.logging_stream(open(os.devnull, "w"))

We could also add a shortcut with quiet=True for that.

Priority

This thing was touched on in https://github.com/pytorch/vision/issues/330#issuecomment-854715846 and from time to time we receive issues (#330) to either silence the output or redirect it to a different stream (#7040).

Still, I think the priority is pretty low for this. I just wanted to have it in a separate issue to make it easier to track.

Dec 20 '22 11:12 pmeier

Hi @pmeier!

Could I help you with this? I would also be interested in allowing the user to disable some of the messages. Sorry for the beginner question, but is there also a reason not to use Python's logging module (as in lightning, for instance)?

Jul 23 '24 10:07 o-laurent