datasets
datasets copied to clipboard
Logging levels not taken into account
Describe the bug
The logging
module isn't working as intended relative to the levels to set.
Steps to reproduce the bug
from datasets import logging
logging.set_verbosity_debug()
logger = logging.get_logger()
logger.error("ERROR")
logger.warning("WARNING")
logger.info("INFO")
logger.debug("DEBUG"
Expected results
I expect all logs to be output since I'm putting a debug
level.
Actual results
Only the two first logs are output.
Environment info
-
datasets
version: 1.11.0 - Platform: Linux-5.13.9-arch1-1-x86_64-with-glibc2.33
- Python version: 3.9.6
- PyArrow version: 5.0.0
To go further
This logging issue appears in datasets
but not in transformers
. It happens because there is no handler defined for the logger. When no handler is defined, the logging
library will output a one-off error to stderr, using a StderrHandler
with level WARNING
.
transformers
sets a default StreamHandler
here
I just take a look at all the outputs produced by datasets
using the different log-levels.
As far as i can tell using datasets==1.17.0
they overall issue seems to be fixed.
However, I noticed that there is one tqdm based progress indicator appearing on STDERR that I can simply not suppress.
Resolving data files: 100%|██████████| 652/652 [00:00<00:00, 1604.52it/s]
According to _get_origin_metadata_locally_or_by_urls it shold be supressable by using the NOTSET
log-level
https://github.com/huggingface/datasets/blob/1406a04c3e911cec2680d8bc513653e0cafcaaa4/src/datasets/data_files.py#L491-L501
Sadly when specifiing the log-level NOTSET
it seems to has no effect.
But appart from it not having any effect I must admit that it seems unintuitive to me. I would suggest changing this such that it is only shown when the log-level is greater or equal to INFO.
This would conform better to INFO according to the documentation.
This will display most of the logging information and tqdm bars.
Any inputs on this? I will be happy to supply a PR if desired 👍
Hi! This should disable the tqdm output:
import datasets
datasets.set_progress_bar_enabled(False)
On a side note: I believe the issue with logging (not tqdm) is still relevant on master.