pynvim replace_termcodes with DecodeHook fails on non-ASCII input

vim.replace_termcodes seems to be broken when used on a vim client with a neovim.DecodeHook installed. It seems like it might be trying to double-decode input or something.

An example from maktaba:

>>> vim.replace_termcodes(u":let g:weirdpath = maktaba#path#Join([g:repo, 'weird¬p…l✓u↓g⏎i‽n'])<CR>")
b":let g:weirdpath = maktaba#path#Join([g:repo, 'weird\xc2\xacp\xe2\x80\xfeX\xa6l\xe2\x9c\x93u\xe2\x86\x93g\xe2\x8f\x8ei\xe2\x80\xfeX\xbdn'])\r"
>>> vim.with_hook(neovim.DecodeHook()).replace_termcodes(u":let g:weirdpath = maktaba#path#Join([g:repo, 'weird¬p…l✓u↓g⏎i‽n'])<CR>")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/dbarnett/.local/lib/python3.4/site-packages/neovim/api/nvim.py", line 168, in replace_termcodes
    from_part, do_lt, special)
  File "/home/dbarnett/.local/lib/python3.4/site-packages/neovim/api/common.py", line 213, in request
    'out-request')
  File "/home/dbarnett/.local/lib/python3.4/site-packages/neovim/api/common.py", line 240, in walk
    return fn(obj, *args)
  File "/home/dbarnett/.local/lib/python3.4/site-packages/neovim/api/common.py", line 148, in <lambda>
    return lambda o, s, m, k: f1(f2(o, s, m, k), s, m, k)
  File "/home/dbarnett/.local/lib/python3.4/site-packages/neovim/api/common.py", line 170, in _decode_if_bytes
    return obj.decode(self.encoding, errors=self.encoding_errors)
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 55-56: invalid continuation byte

I tried some other variations of passing pre-encoded bytes and such, and couldn't find anything that wouldn't blow up.

It would also be good to get unicode strings out at least when passing unicode strings in.

Oct 24 '14 05:10 dbarnett

Ahhh, it actually makes some sense that replace_termcodes would return invalid strings. The returned value is the internally representation Vim uses, i.e. there may be some escaping going around.

Quick fix is to call replace_termcodes from a session without DecodeHook. The "right way(TM)" is to have replace_termcodes return a new Binary type that is not treated as a decodable string.

Oct 24 '14 08:10 equalsraf

The "right way(TM)" is to have replace_termcodes return a new Binary type that is not treated as a decodable string.

Can't that be done with another SessionHook? For example, see how the ScriptHost class uses a hook to emulate the legacy behavior of eval

Oct 24 '14 10:10 tarruda

@tarruda Yes, I created a pull request google/vroom#78 that keeps two Nvim objects one with and one without the DecodeHook. Seems to work as intended.

Oct 24 '14 10:10 equalsraf

Sounds fine as a workaround. Is it possible to make this less brittle? I don't understand where the decoding problem arises, but it seems like python should have enough context to DTRT.

Oct 24 '14 17:10 dbarnett

Sounds fine as a workaround. Is it possible to make this less brittle? I don't understand where the decoding problem arises, but it seems like python should have enough context to DTRT.

Possible yes, but not on the short run. As DTRT goes, replace_termcodes() definitely returns invalid strings by design - but since the DecodeHook tries to convert all binary strings into Unicode it causes the error. At this point it is not possible to enable/disable the DecodeHook for each function call.

Oct 26 '14 14:10 equalsraf

With the latest changes we could change replace_termcodes to always return bytes

Apr 02 '16 18:04 bfredl

pynvim pynvim copied to clipboard

replace_termcodes with DecodeHook fails on non-ASCII input

pynvim
pynvim copied to clipboard