python-coloredlogs icon indicating copy to clipboard operation
python-coloredlogs copied to clipboard

UnicodeDecodeError can happen when passing objects as msg on python2

Open perrinjerome opened this issue 3 years ago • 0 comments

First of all, this is a problem only affecting python2.7, so if you think it's no longer relevant, don't hesitate to close this issue.

I thought that you might still be interested in this, because this package still claim to support python 2 in setup.py's classifiers and this is a scenario which works fine with standard logging, but breaks when coloredlogs is used. On python3 this problem does not happen.

Logging common API, described at https://docs.python.org/2/library/logging.html#logging.debug is:

logging.debug(msg[, *args[, **kwargs]])

the most common usage is to pass a string as msg, but as we can see in https://docs.python.org/3/howto/logging.html#using-arbitrary-objects-as-messages it's also supported to pass arbitrary objects as msg and their __str__ method will be used to convert the objects to string.

When using python logging module, this works fine, even when the string contain non ascii characters, for example:

# coding: utf-8
class O:
  def __str__(self):
    return "💥"

import logging
logging.basicConfig()
logging.getLogger().critical(O())

correctly output:

CRITICAL:root:💥

but when coloredlogs is used, like with this example:

import coloredlogs
coloredlogs.install()
logging.getLogger().critical(O())

an UnicodeDecodeError is raised:

Traceback (most recent call last):
  File "../lib/python2.7/logging/__init__.py", line 868, in emit
    msg = self.format(record)
  File "../lib/python2.7/logging/__init__.py", line 741, in format
    return fmt.format(record)
  File "../coloredlogs/__init__.py", line 1137, in format
    copy.msg = ansi_wrap(coerce_string(record.msg), **style)
  File "../lib/python2.7/site-packages/humanfriendly/compat.py", line 119, in coerce_string
    return value if is_string(value) else unicode(value)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal not in range(128)

This happened with humanfriendly==10.0, coerce_string is defined as https://github.com/xolox/python-humanfriendly/blob/6758ac61f906cd8528682003070a57febe4ad3cf/humanfriendly/compat.py#L101-L108

Maybe this can be addressed in humanfriendly, by making coerce_string trying harder to decode the string, maybe something like this, because most strings are UTF-8 anyway:

def coerce_string(value):
    if sys.version_info < (3,):
        # If value define `__unicode__`, use this directly. If it does not
        # and `__str__` returns bytes that can not be decoded to unicode,
        # then use `__str__` and decode.
        try:
            value = unicode(value)
        except UnicodeDecodeError:
            value = unicode(str(value), 'utf-8', 'replace')
    return value if is_string(value) else unicode(value)

perrinjerome avatar Nov 07 '21 12:11 perrinjerome