python-coloredlogs
python-coloredlogs copied to clipboard
UnicodeDecodeError can happen when passing objects as msg on python2
First of all, this is a problem only affecting python2.7, so if you think it's no longer relevant, don't hesitate to close this issue.
I thought that you might still be interested in this, because this package still claim to support python 2 in setup.py's classifiers and this is a scenario which works fine with standard logging, but breaks when coloredlogs is used. On python3 this problem does not happen.
Logging common API, described at https://docs.python.org/2/library/logging.html#logging.debug is:
logging.debug(msg[, *args[, **kwargs]])
the most common usage is to pass a string as msg
, but as we can see in https://docs.python.org/3/howto/logging.html#using-arbitrary-objects-as-messages it's also supported to pass arbitrary objects as msg
and their __str__
method will be used to convert the objects to string.
When using python logging module, this works fine, even when the string contain non ascii characters, for example:
# coding: utf-8
class O:
def __str__(self):
return "💥"
import logging
logging.basicConfig()
logging.getLogger().critical(O())
correctly output:
CRITICAL:root:💥
but when coloredlogs is used, like with this example:
import coloredlogs
coloredlogs.install()
logging.getLogger().critical(O())
an UnicodeDecodeError
is raised:
Traceback (most recent call last):
File "../lib/python2.7/logging/__init__.py", line 868, in emit
msg = self.format(record)
File "../lib/python2.7/logging/__init__.py", line 741, in format
return fmt.format(record)
File "../coloredlogs/__init__.py", line 1137, in format
copy.msg = ansi_wrap(coerce_string(record.msg), **style)
File "../lib/python2.7/site-packages/humanfriendly/compat.py", line 119, in coerce_string
return value if is_string(value) else unicode(value)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal not in range(128)
This happened with humanfriendly==10.0
, coerce_string
is defined as https://github.com/xolox/python-humanfriendly/blob/6758ac61f906cd8528682003070a57febe4ad3cf/humanfriendly/compat.py#L101-L108
Maybe this can be addressed in humanfriendly, by making coerce_string
trying harder to decode the string, maybe something like this, because most strings are UTF-8 anyway:
def coerce_string(value):
if sys.version_info < (3,):
# If value define `__unicode__`, use this directly. If it does not
# and `__str__` returns bytes that can not be decoded to unicode,
# then use `__str__` and decode.
try:
value = unicode(value)
except UnicodeDecodeError:
value = unicode(str(value), 'utf-8', 'replace')
return value if is_string(value) else unicode(value)