datasets icon indicating copy to clipboard operation
datasets copied to clipboard

Logging levels not taken into account

Open LysandreJik opened this issue 3 years ago • 2 comments

Describe the bug

The logging module isn't working as intended relative to the levels to set.

Steps to reproduce the bug

from datasets import logging

logging.set_verbosity_debug()
logger = logging.get_logger()

logger.error("ERROR")
logger.warning("WARNING")
logger.info("INFO")
logger.debug("DEBUG"

Expected results

I expect all logs to be output since I'm putting a debug level.

Actual results

Only the two first logs are output.

Environment info

  • datasets version: 1.11.0
  • Platform: Linux-5.13.9-arch1-1-x86_64-with-glibc2.33
  • Python version: 3.9.6
  • PyArrow version: 5.0.0

To go further

This logging issue appears in datasets but not in transformers. It happens because there is no handler defined for the logger. When no handler is defined, the logging library will output a one-off error to stderr, using a StderrHandler with level WARNING.

transformers sets a default StreamHandler here

LysandreJik avatar Aug 24 '21 11:08 LysandreJik

I just take a look at all the outputs produced by datasets using the different log-levels. As far as i can tell using datasets==1.17.0 they overall issue seems to be fixed.

However, I noticed that there is one tqdm based progress indicator appearing on STDERR that I can simply not suppress.

Resolving data files: 100%|██████████| 652/652 [00:00<00:00, 1604.52it/s]

According to _get_origin_metadata_locally_or_by_urls it shold be supressable by using the NOTSET log-level https://github.com/huggingface/datasets/blob/1406a04c3e911cec2680d8bc513653e0cafcaaa4/src/datasets/data_files.py#L491-L501 Sadly when specifiing the log-level NOTSET it seems to has no effect.

But appart from it not having any effect I must admit that it seems unintuitive to me. I would suggest changing this such that it is only shown when the log-level is greater or equal to INFO.

This would conform better to INFO according to the documentation.

This will display most of the logging information and tqdm bars.

Any inputs on this? I will be happy to supply a PR if desired 👍

yweweler avatar Jan 17 '22 13:01 yweweler

Hi! This should disable the tqdm output:

import datasets
datasets.set_progress_bar_enabled(False)

On a side note: I believe the issue with logging (not tqdm) is still relevant on master.

mariosasko avatar Jan 19 '22 14:01 mariosasko