Fix for UnicodeDecodeError exception

Open wagnerscastle opened this issue 4 years ago • 1 comments

pp() will fail with malformed unicode strings that the default pprint library works fine with. At least on python3.

To fix this, replace seg_list = [seg.decode('utf8') for seg in seg_list] with seg_list = [seg.decode('utf8','backslashreplace') for seg in seg_list]

To reproduce:

pp({'1':'string\xe9'})
{
  '1':
Traceback (most recent call last):
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 6: invalid continuation byte

After patch:

pp({'1':'string\xe9'})
{
  '1': 'string\xe9',
}

Of note is that a dict.key is printed just fine. For example (without the patch):

pp({'\xe91\xe9':'string'})
{
  'é1é': 'string',
}

May 07 '21 18:05 wagnerscastle

I’m dealing with the UnicodeDecodeError as well: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x99 in position 43: invalid start byte.

It happens when trying to print a JSON stringified Python dict. I forked beeprint with the fix, and I can verify that it works.

Nov 23 '23 15:11 Nevrai