pudb icon indicating copy to clipboard operation
pudb copied to clipboard

UnicodeDecodeError: 'ascii' codec can't decode byte ...

Open wookayin opened this issue 6 years ago • 28 comments

Environment:

  • Python 2.7
  • Linux x64

Stack trace:

  ...
  File "/home/wookayin/.local/lib/python2.7/site-packages/pudb/debugger.py", line 360, in user_call
    self.interaction(frame)
  File "/home/wookayin/.local/lib/python2.7/site-packages/pudb/debugger.py", line 349, in interaction
    show_exc_dialog=show_exc_dialog)
  File "/home/wookayin/.local/lib/python2.7/site-packages/pudb/debugger.py", line 2089, in call_with_ui
    return f(*args, **kwargs)
  File "/home/wookayin/.local/lib/python2.7/site-packages/pudb/debugger.py", line 2330, in interaction
    self.event_loop()
  File "/home/wookayin/.local/lib/python2.7/site-packages/pudb/debugger.py", line 2288, in event_loop
    canvas = toplevel.render(self.size, focus=True)
  File "/home/wookayin/.local/lib/python2.7/site-packages/urwid/widget.py", line 141, in cached_render
    canv = fn(self, size, focus=focus)
  File "/home/wookayin/.local/lib/python2.7/site-packages/urwid/widget.py", line 1751, in render
    canv = get_delegate(self).render(size, focus=focus)
  File "/home/wookayin/.local/lib/python2.7/site-packages/urwid/widget.py", line 141, in cached_render
    canv = fn(self, size, focus=focus)
  File "/home/wookayin/.local/lib/python2.7/site-packages/urwid/container.py", line 1083, in render
    focus and self.focus_part == 'body')
  File "/home/wookayin/.local/lib/python2.7/site-packages/urwid/widget.py", line 141, in cached_render
    canv = fn(self, size, focus=focus)
  File "/home/wookayin/.local/lib/python2.7/site-packages/urwid/decoration.py", line 225, in render
    canv = self._original_widget.render(size, focus=focus)
  File "/home/wookayin/.local/lib/python2.7/site-packages/urwid/widget.py", line 141, in cached_render
    canv = fn(self, size, focus=focus)
  File "/home/wookayin/.local/lib/python2.7/site-packages/urwid/container.py", line 2085, in render
    focus = focus and self.focus_position == i)
  File "/home/wookayin/.local/lib/python2.7/site-packages/urwid/widget.py", line 141, in cached_render
    canv = fn(self, size, focus=focus)
  File "/home/wookayin/.local/lib/python2.7/site-packages/urwid/widget.py", line 1751, in render
    canv = get_delegate(self).render(size, focus=focus)
  File "/home/wookayin/.local/lib/python2.7/site-packages/urwid/widget.py", line 141, in cached_render
    canv = fn(self, size, focus=focus)
  File "/home/wookayin/.local/lib/python2.7/site-packages/urwid/container.py", line 1526, in render
    canv = w.render((maxcol, rows), focus=focus and item_focus)
  File "/home/wookayin/.local/lib/python2.7/site-packages/urwid/widget.py", line 141, in cached_render
    canv = fn(self, size, focus=focus)
  File "/home/wookayin/.local/lib/python2.7/site-packages/urwid/decoration.py", line 225, in render
    canv = self._original_widget.render(size, focus=focus)
  File "/home/wookayin/.local/lib/python2.7/site-packages/urwid/widget.py", line 141, in cached_render
    canv = fn(self, size, focus=focus)
  File "/home/wookayin/.local/lib/python2.7/site-packages/urwid/container.py", line 1526, in render
    canv = w.render((maxcol, rows), focus=focus and item_focus)
  File "/home/wookayin/.local/lib/python2.7/site-packages/urwid/widget.py", line 141, in cached_render
    canv = fn(self, size, focus=focus)
  File "/home/wookayin/.local/lib/python2.7/site-packages/urwid/decoration.py", line 225, in render
    canv = self._original_widget.render(size, focus=focus)
  File "/home/wookayin/.local/lib/python2.7/site-packages/urwid/widget.py", line 141, in cached_render
    canv = fn(self, size, focus=focus)
  File "/home/wookayin/.local/lib/python2.7/site-packages/urwid/widget.py", line 1751, in render
    canv = get_delegate(self).render(size, focus=focus)
  File "/home/wookayin/.local/lib/python2.7/site-packages/urwid/widget.py", line 141, in cached_render
    canv = fn(self, size, focus=focus)
  File "/home/wookayin/.local/lib/python2.7/site-packages/urwid/listbox.py", line 475, in render
    focus_canvas = focus_widget.render((maxcol,), focus=focus)
  File "/home/wookayin/.local/lib/python2.7/site-packages/urwid/widget.py", line 141, in cached_render
    canv = fn(self, size, focus=focus)
  File "/home/wookayin/.local/lib/python2.7/site-packages/pudb/var_view.py", line 164, in render
    return make_canvas(text, attr, maxcol, apfx+"value")
  File "/home/wookayin/.local/lib/python2.7/site-packages/pudb/ui_tools.py", line 38, in make_canvas
    line_attr = list(get_byte_line_attr(line, line_attr))
  File "/home/wookayin/.local/lib/python2.7/site-packages/pudb/ui_tools.py", line 34, in get_byte_line_attr
    byte_count = len(line[i:i+column_count].encode(_target_encoding))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 9: ordinal not in range(128)

Link: ui_tools.py:34.

I am not sure for when it happens or which line it is reading (it is hard to print them out), but I could somehow workaround with the following dirty patch, ~simply ignoring the error~ (UPDATED) not calling encode():

         def get_byte_line_attr(line, line_attr):
             i = 0
             for label, column_count in line_attr:
-                byte_count = len(line[i:i+column_count].encode(_target_encoding))
+                byte_count = len(line[i:i+column_count])
                 i += column_count
                 yield label, byte_count

wookayin avatar Oct 04 '17 00:10 wookayin

Version of pudb? Urwid? Installed from where? Does it happen with git master?

inducer avatar Oct 04 '17 01:10 inducer

Can you test if https://github.com/inducer/pudb/pull/273 fixes it (sorry, I need to finish that PR).

asmeurer avatar Oct 04 '17 02:10 asmeurer

I was using pudb 2017.1.4 (installed from pypi) and the master (676ee9d2) from git, and urwid==1.3.1.

I printed the content of line in such situations, and found that the string contains an unicode character (which was from a cuda array actually). The type of line was str, not a unicode. It has a representation \xe2\x8b\xb1. So I think line[i:i+column_count] should be a unicode (in python 2). I can confirm that target_encoding is utf-8.

Note that:

>>> '⋱'.encode('utf8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)

>>> u'⋱'.encode('utf8')
'\xe2\x8b\xb1'

By the way can you clarify why do we need .encode()?

@asmeurer I will also later give a try with your patch. But given that the above observation (which is an unicode issue), I doubt that it will fix the unicode problem.

wookayin avatar Oct 04 '17 03:10 wookayin

So the following is a minimal 'killer' of pudb that reproduces (I'll try to find another one without torch)

import torch
a = torch.zeros(500, 500)

import pudb; pudb.set_trace()  # XXX DEBUG

python killer.py kills, but pudb killer.py doesn't. Only happens in python 2 (python3 is okay).

I guess the problem is that, even in the var_view.py, the value_str still contains an unicode character.

wookayin avatar Oct 04 '17 04:10 wookayin

A repro script that doesn't require torch would be great.

asmeurer avatar Oct 04 '17 18:10 asmeurer

I've come up with this one (killer.py):

class S(object):
    def __repr__(self):
        return '\xe2\x8b\xb1'
s = S()
import pudb; pudb.set_trace()  # XXX DEBUG

$ python killer.py

wookayin avatar Oct 04 '17 18:10 wookayin

An attempt to fix this: eee683d4.

@inducer Do you see any side-effects, or could you please answer what is the purpose of .encode() in the line?

wookayin avatar Oct 28 '17 21:10 wookayin

With your patch, in Python 2, s is shown as . In Python 3, it's shown as ⋱.

asmeurer avatar Oct 29 '17 03:10 asmeurer

@asmeurer Oh I see, I will try to come up with a decent solution that works with both python 2 and 3.

wookayin avatar Oct 29 '17 04:10 wookayin

As the comment suggests, TextCanvas does seem to require an encoded count.

The problem is that in Python 2, an object's repr can be already encoded, since there is no distinction between bytes and strings

Strictly speaking, if your program does have '\xe2\x8b\xb1' or '⋱' this should be considered a bug. As I pointed out, in Python 3, '\xe2\x8b\xb1' is a three character string, whereas b'\xe2\x8b\xb1' represents the byte encoded version of '⋱'). You need either u'⋱' or b'\xe2\x8b\xb1'.decode('utf-8') to properly obtain that character in Python 2.

I think the correct fix is to add a try except to the encode, and assume that if the encode fails with the above error in Python 2, it's because the string should already be a byte string.

asmeurer avatar Oct 29 '17 04:10 asmeurer

Error on the same line of code for me, on python 3.6 on a mac. Removing .encode(_target_encoding) as suggested on the first comment fixed it. Is there any reason why it wouldn't be a good idea to just write a PR and change that code? Or any other solution, really. A weird character is imho better than a crash and this has bitten me so many times already.

The character that makes it break is probably different, as I get a different UnicodeDecodeError (i.e UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in position 25: ordinal not in range(128)).

The complete trace looks like this:

...
  File "/usr/local/Cellar/python/3.6.5_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/bdb.py", line 51, in trace_dispatch
    return self.dispatch_line(frame)
  File "/Users/ish/code/mota/venv/lib/python3.6/site-packages/pudb/debugger.py", line 187, in dispatch_line
    self.user_line(frame)
  File "/Users/ish/code/mota/venv/lib/python3.6/site-packages/pudb/debugger.py", line 408, in user_line
    self.interaction(frame)
  File "/Users/ish/code/mota/venv/lib/python3.6/site-packages/pudb/debugger.py", line 376, in interaction
    show_exc_dialog=show_exc_dialog)
  File "/Users/ish/code/mota/venv/lib/python3.6/site-packages/pudb/debugger.py", line 2118, in call_with_ui
    return f(*args, **kwargs)
  File "/Users/ish/code/mota/venv/lib/python3.6/site-packages/pudb/debugger.py", line 2362, in interaction
    self.event_loop()
  File "/Users/ish/code/mota/venv/lib/python3.6/site-packages/pudb/debugger.py", line 2320, in event_loop
    canvas = toplevel.render(self.size, focus=True)
  File "/Users/ish/code/mota/venv/lib/python3.6/site-packages/urwid/widget.py", line 144, in cached_render
    canv = fn(self, size, focus=focus)
  File "/Users/ish/code/mota/venv/lib/python3.6/site-packages/urwid/widget.py", line 1765, in render
    canv = get_delegate(self).render(size, focus=focus)
  File "/Users/ish/code/mota/venv/lib/python3.6/site-packages/urwid/widget.py", line 144, in cached_render
    canv = fn(self, size, focus=focus)
  File "/Users/ish/code/mota/venv/lib/python3.6/site-packages/urwid/container.py", line 1086, in render
    focus and self.focus_part == 'body')
  File "/Users/ish/code/mota/venv/lib/python3.6/site-packages/urwid/widget.py", line 144, in cached_render
    canv = fn(self, size, focus=focus)
  File "/Users/ish/code/mota/venv/lib/python3.6/site-packages/urwid/decoration.py", line 226, in render
    canv = self._original_widget.render(size, focus=focus)
  File "/Users/ish/code/mota/venv/lib/python3.6/site-packages/urwid/widget.py", line 144, in cached_render
    canv = fn(self, size, focus=focus)
  File "/Users/ish/code/mota/venv/lib/python3.6/site-packages/urwid/container.py", line 2087, in render
    focus = focus and self.focus_position == i)
  File "/Users/ish/code/mota/venv/lib/python3.6/site-packages/urwid/widget.py", line 144, in cached_render
    canv = fn(self, size, focus=focus)
  File "/Users/ish/code/mota/venv/lib/python3.6/site-packages/urwid/widget.py", line 1765, in render
    canv = get_delegate(self).render(size, focus=focus)
  File "/Users/ish/code/mota/venv/lib/python3.6/site-packages/urwid/widget.py", line 144, in cached_render
    canv = fn(self, size, focus=focus)
  File "/Users/ish/code/mota/venv/lib/python3.6/site-packages/urwid/container.py", line 1529, in render
    canv = w.render((maxcol, rows), focus=focus and item_focus)
  File "/Users/ish/code/mota/venv/lib/python3.6/site-packages/urwid/widget.py", line 144, in cached_render
    canv = fn(self, size, focus=focus)
  File "/Users/ish/code/mota/venv/lib/python3.6/site-packages/urwid/decoration.py", line 226, in render
    canv = self._original_widget.render(size, focus=focus)
  File "/Users/ish/code/mota/venv/lib/python3.6/site-packages/urwid/widget.py", line 144, in cached_render
    canv = fn(self, size, focus=focus)
  File "/Users/ish/code/mota/venv/lib/python3.6/site-packages/urwid/container.py", line 1529, in render
    canv = w.render((maxcol, rows), focus=focus and item_focus)
  File "/Users/ish/code/mota/venv/lib/python3.6/site-packages/urwid/widget.py", line 144, in cached_render
    canv = fn(self, size, focus=focus)
  File "/Users/ish/code/mota/venv/lib/python3.6/site-packages/urwid/decoration.py", line 226, in render
    canv = self._original_widget.render(size, focus=focus)
  File "/Users/ish/code/mota/venv/lib/python3.6/site-packages/urwid/widget.py", line 144, in cached_render
    canv = fn(self, size, focus=focus)
  File "/Users/ish/code/mota/venv/lib/python3.6/site-packages/urwid/widget.py", line 1765, in render
    canv = get_delegate(self).render(size, focus=focus)
  File "/Users/ish/code/mota/venv/lib/python3.6/site-packages/urwid/widget.py", line 144, in cached_render
    canv = fn(self, size, focus=focus)
  File "/Users/ish/code/mota/venv/lib/python3.6/site-packages/urwid/listbox.py", line 501, in render
    canvas = widget.render((maxcol,))
  File "/Users/ish/code/mota/venv/lib/python3.6/site-packages/urwid/widget.py", line 144, in cached_render
    canv = fn(self, size, focus=focus)
  File "/Users/ish/code/mota/venv/lib/python3.6/site-packages/pudb/var_view.py", line 191, in render
    return make_canvas(text, attr, maxcol, apfx+"value")
  File "/Users/ish/code/mota/venv/lib/python3.6/site-packages/pudb/ui_tools.py", line 48, in make_canvas
    line_attr = list(get_byte_line_attr(line, line_attr))
  File "/Users/ish/code/mota/venv/lib/python3.6/site-packages/pudb/ui_tools.py", line 44, in get_byte_line_attr
    byte_count = len(line[i:i+column_count].encode(_target_encoding))
UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in position 25: ordinal not in range(128)

PS: thanks for looking into it.

isaacbernat avatar Oct 11 '18 12:10 isaacbernat

@isaacbernat Is your version of pudb recent enough to include the changes from #273? (2018.1 or newer?)

inducer avatar Oct 11 '18 15:10 inducer

Hi @inducer , the version I have is PuDB 2018.1 (as seen when entering debugging mode). I got it from pip (if that makes a difference).

(venv) ➜  ~ pip show pudb
Name: pudb
Version: 2018.1
Summary: A full-screen, console-based Python debugger
Home-page: https://github.com/inducer/pudb
Author: Andreas Kloeckner
Author-email: [email protected]
License: UNKNOWN
Location: /Users/ish/code/mota/venv/lib/python3.6/site-packages
Requires: urwid, pygments

isaacbernat avatar Oct 12 '18 07:10 isaacbernat

I just found pudb and it looked nice until I noticed it cannot handle unicode (similar errors as above). Any fix on this? Version is 2019.2

Thanks!

anttin2020 avatar Jun 07 '20 11:06 anttin2020

What I can say is that it contains code to handle Unicode that seems to work reliably for me, so it's not like we're not trying. (See e.g. here.) What version of Python are you on? Could you post steps to reproduce? (Ideally, a minimal source file.)

inducer avatar Jun 07 '20 20:06 inducer

Thank you for fast reply :)

It can't get much simpler than this :D I created this in VSCode and the status bar says it is UTF-8, line feed is LF (Linux). And the coding is marked the normal way in the file:

#!/opt/alt/python37/bin/python3
# -*- coding: utf8 -*-

s="testing unicode äöäöäö"
print(s)

When I try to run that with python3 -m pudb.run test.py, I notice these problems:

  1. The unicode characters in source are shown as '?':
   1 #!/opt/alt/python37/bin/python3                                Variables:
   2 # -*- coding: utf8 -*-
   3
>  4 s="testing unicode ??????"
   5 print(s)
   6
   7
  1. When I press n on the current line (line 4, not even the print command yet), I get this (the full LONG log: pudblog.txt):
Traceback (most recent call last):                     
  File "/home/username/.local/lib/python3.7/site-packages/pudb/__init__.py", line 153, in runscript
    dbg._runscript(mainpyfile)
  File "/home/username/.local/lib/python3.7/site-packages/pudb/debugger.py", line 468, in _runscript
    self.run(statement)
  File "/opt/alt/python37/lib64/python3.7/bdb.py", line 585, in run
    exec(cmd, globals, locals)
  File "<string>", line 1, in <module>
  File "test.py", line 5, in <module>
    print(s)
  File "test.py", line 5, in <module>
    print(s)
  File "/opt/alt/python37/lib64/python3.7/bdb.py", line 88, in trace_dispatch
    return self.dispatch_line(frame)
  File "/home/username/.local/lib/python3.7/site-packages/pudb/debugger.py", line 189, in dispatch_line
    self.user_line(frame)
.
.
  File "/home/username/.local/lib/python3.7/site-packages/pudb/ui_tools.py", line 50, in make_canvas
    line_attr = list(get_byte_line_attr(line, line_attr))
  File "/home/username/.local/lib/python3.7/site-packages/pudb/ui_tools.py", line 46, in get_byte_line_attr
    byte_count = len(line[i:i+column_count].encode(_target_encoding))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 21-26: ordinal not in range(128)

And of course running the code directly with python works fine:

$ python3 test.py
testing unicode äöäöäö

Is there something I need to do in pudb to get it to understand unicode? Now it obviously thinks everything is ASCII. That is strange as in Python3, (quote from Python3 manual) "Since Python 3.0, the language’s str type contains Unicode characters, meaning any string created using "unicode rocks!", 'unicode rocks!', or the triple-quoted string syntax is stored as Unicode.". So in Python3 there should not even be such problem anymore :)

Version/system info: -This is webhotel, based on cPanel system -Python 3.7.3 -Linux version 3.10.0-962.3.2.lve1.5.25.10.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC) ) #1 SMP Wed May 29 04:37:40 EDT 2019 -I installed it with python3 -m- pip install pudb

I hope you can help me, pudb would seem very nice tool but if I cannot use it with unicode code/data, I cannot use it.

Thank you!

anttin2020 avatar Jun 08 '20 05:06 anttin2020

Works for me without an issue. (on Debian testing) "Proof:"

grafik

In your traceback, it seems that Urwid (the widget library on which Pudb is based) decides that ascii is its target encoding. (from urwid.util import _target_encoding) What version of Urwid do you have?

One thing I learned from your traceback is that pudb (kind of needlessly) crashes if the encode for the source line fails. In #387, I've replaced that with a more useful strategy, matching what Urwid does internally. It doesn't address the core issue on codec choice, but it should get rid of the crashes. Could you try that and let me know how that went?

inducer avatar Jun 08 '20 14:06 inducer

Hi! Thank you again for fast reply :)

Urwid is 2.1.0 and it should be the one that got installed with pudb (as far as I know, I didn't have it earlier but I cannot be 100% sure).

Anyway, yes I would be happy to try the new update but how can I install it? :o Sorry if this is stupid question but I have not used Github for other than reading or downloading from the code section and in this case I installed pudb with pip. So can you please tell me what to do to get the latest update? :)

Thank you!

anttin2020 avatar Jun 08 '20 15:06 anttin2020

This should do it:

pip uninstall pudb
pip install git+https://github.com/inducer/pudb.git@code-encoding-fallback

inducer avatar Jun 08 '20 15:06 inducer

Yes now it does not crash, thank you :) But still it shows the unicode as '?' in source code, what can cause that?

anttin2020 avatar Jun 08 '20 16:06 anttin2020

Yes now it does not crash, thank you :)

Thanks! I'll merge #387 then.

But still it shows the unicode as '?' in source code, what can cause that?

As I said above, that's because Urwid somehow decides to use ascii as a codec. You could check with them to see if that's a known problem.

inducer avatar Jun 08 '20 16:06 inducer

Ah, ok. Anyway, now I can use pudb, thank you very much for your help :)

BTW, do you happen to use VSCode? I have big issues with it and the developers don't seem to care (posted issue month ago, still no replies). I know this is totally off-topic but I'm pretty desperate, do you think you could try to help if you are familiar with it and especially remote development extension?

anttin2020 avatar Jun 08 '20 17:06 anttin2020

While I'm generally happy to support my own software, I'm unable to provide free tech support for projects to which I have no connection.

inducer avatar Jun 08 '20 17:06 inducer

Yes of course, I understand that, I just thought that if you might have seen/heard about similar problems (excessive process count 80-100). I will try to find someone in the development who would be interested in helping me, seems difficult so far but I'll keep trying :)

I will try to deal with the urwid problem with those developers and it is not serious problem anymore, now I can use pudb with my code.

Thank you very much and have a nice day :)

anttin2020 avatar Jun 08 '20 18:06 anttin2020

What happened? The install command does not work anymore :o I wanted to make sure I have latest urwid as well so I uninstalled both pudb and urwid and normal install of pudb works but the git command you gave now says

$ pip install git+https://github.com/inducer/pudb.git@code-encoding-fallback
Defaulting to user installation because normal site-packages is not writeable
Collecting git+https://github.com/inducer/pudb.git@code-encoding-fallback
 Cloning https://github.com/inducer/pudb.git (to revision code-encoding-fallback) to /tmp/pip-req-build-z4045ce2
Running command git clone -q https://github.com/inducer/pudb.git /tmp/pip-req-build-z4045ce2
 WARNING: Did not find branch or tag 'code-encoding-fallback', assuming revision or ref.
 Running command git checkout -q code-encoding-fallback
 error: pathspec 'code-encoding-fallback' did not match any file(s) known to git

How can I get the fixed version again?

anttin2020 avatar Jun 08 '20 18:06 anttin2020

Ok I tried this (guessed based on the note on that page): pip install git+https://github.com/inducer/pudb.git@master and it installs and works, still source unicode as '?' but works :)

anttin2020 avatar Jun 08 '20 18:06 anttin2020

The install command doesn't work because @inducer deleted the branch. You can not just install from master.

The only way I can reproduce an error here is if I use a terminal that cannot display unicode characters (I get an error in this case even in pudb master). Otherwise for me it shows the characters. What terminal are you using? Also, what is this:

-This is webhotel, based on cPanel system

asmeurer avatar Jun 08 '20 19:06 asmeurer

The install command doesn't work because @inducer deleted the branch. You can not just install from master.

But as I said, that is exactly how I got it to work after he had removed the code-encoding-fallback branch :D

The only way I can reproduce an error here is if I use a terminal that cannot display unicode characters (I get an error in this case even in pudb master). Otherwise for me it shows the characters. What terminal are you using?

I use Bitvise SSH Client's (Win10) terminal which is xterm with UTF-8 codepage. And directly running the test.py works. But if I use VSCode's terminal it shows the unicode correctly so now I'm confused. How is it possible that in Bitvise terminal the direct run shows unicode correctly but if I use pudb it shows '?' :O

Also, what is this:

-This is webhotel, based on cPanel system

That is the environment I am in, it is Linux but as it is webhotel that is based on cPanel control panel system, it has limited resources (max. 100 processes, 1GB RAM etc.). Probably not related to this problem but I wanted to give all info I could :)

anttin2020 avatar Jun 09 '20 04:06 anttin2020