Improve logging in datasets?
Status Quo
Currently our datasets sometimes print diagnostic messages:
https://github.com/pytorch/vision/blob/657c0767c5ca5564c8b437ac44263994c8e01352/torchvision/datasets/caltech.py#L128
The common download utilities write to STDOUT
https://github.com/pytorch/vision/blob/657c0767c5ca5564c8b437ac44263994c8e01352/torchvision/datasets/utils.py#L156
and use tqdm which writes to STDERR:
https://github.com/pytorch/vision/blob/657c0767c5ca5564c8b437ac44263994c8e01352/torchvision/datasets/utils.py#L36
The latter has the option to also write to a different stream, but our fallback from torch.hub does not.
In some cases some information is also logged by our dependencies
https://github.com/pytorch/vision/blob/657c0767c5ca5564c8b437ac44263994c8e01352/torchvision/datasets/coco.py#L36
In any case, the user has no control over it whatsoever.
Proposal
Have a global or local setting for the stream we write to. For example
torchvision.datasets.logging_stream()
I would default it to sys.stdout, but no strong opinion. To silence everything, one could do
import os
torchvision.datasets.logging_stream(open(os.devnull, "w"))
We could also add a shortcut with quiet=True for that.
Priority
This thing was touched on in https://github.com/pytorch/vision/issues/330#issuecomment-854715846 and from time to time we receive issues (#330) to either silence the output or redirect it to a different stream (#7040).
Still, I think the priority is pretty low for this. I just wanted to have it in a separate issue to make it easier to track.
Hi @pmeier!
Could I help you with this? I would also be interested in allowing the user to disable some of the messages. Sorry for the beginner question, but is there also a reason not to use Python's logging module (as in lightning, for instance)?