vim-autoformat icon indicating copy to clipboard operation
vim-autoformat copied to clipboard

Use system default encoding when passing code to PIPE

Open NoAnyLove opened this issue 8 years ago • 2 comments

On Windows system, if the source code contains non-ascii characters, the autopep8 and yapf will fail to format the code. For example,

# coding: utf-8
print("中文")

Format above code with autopep8, it outputs

"format_test.py" 3L, 38C
Trying definition from g:formatdef_autopep8
Evaluated formatprg: autopep8 - --max-line-length=80
Using python 3 code...
Formatter autopep8 has errors: b'Traceback (most recent call last):\r\n  File "e:\\python36\\lib\\runpy.py", line 193, in _run_module_as_main\r\n    "__main__", mod_spec)\r\n  File "e:\\python36\\lib\\runpy.py", line 85, in _run_code\r\n    exec(code, run_globals)\r\n  File "E:\\Python36\\Scripts\\autopep8.exe\\__main__.py", line 9, in <module>\r\n  File "e:\\python36\\lib\\site-packages\\autopep8.py", line 3803, in main\r\n    fix_code(sys.stdin.read(), args, encoding=encoding))\r\nUnicodeEncodeError: \'gbk\' codec can\'t encode character \'\\udcad\' in position 25: illegal multibyte sequence\r\n'
Definition in 'g:formatdef_autopep8' was unsuccessful.
No format definitions were successful.
Removing trailing whitespace...
Retabbing...
Autoindenting...
2 lines to indent... 
3 lines indented 

and yapf outputs,

Trying definition from g:formatdef_yapf
Evaluated formatprg: yapf --style="{based_on_style:pep8,indent_width:4,column_limit:80}" -l 1-3
Using python 3 code...
Formatter yapf has errors: b'Traceback (most recent call last):\r\n  File "e:\\python36\\lib\\runpy.py", line 193, in _run_module_as_main\r\n    "__main__", mod_spec)\r\n  File "e:\\python36\\lib\\runpy.py", line 85, in _run_code\r\n    exec(code, run_globals)\r\n  File "E:\\Python36\\Scripts\\yapf.exe\\__main__.py", line 9, in <module>\r\n  File "e:\\python36\\lib\\site-packages\\yapf\\__init__.py", line 306, in run_main\r\n    sys.exit(main(sys.argv))\r\n  File "e:\\python36\\lib\\site-packages\\yapf\\__init__.py", line 177, in main\r\n    file_resources.WriteReformattedCode(\'<stdout>\', reformatted_source)\r\n  File "e:\\python36\\lib\\site-packages\\yapf\\yapflib\\file_resources.py", line 99, in WriteReformattedCode\r\n    py3compat.EncodeAndWriteToStdout(reformatted_code)\r\n  File "e:\\python36\\lib\\site-packages\\yapf\\yapflib\\py3compat.py", line 80, in EncodeAndWriteToStdout\r\n    sys.stdout.buffer.write(s.encode(encoding))\r\nUnicodeEncodeError: \'utf-8\' codec can\'t encode character \'\\udcad\' in position 25: surrogates not allowed\r\n'
Definition in 'g:formatdef_yapf' was unsuccessful.
No format definitions were successful.
Removing trailing whitespace...
Retabbing...
Autoindenting...
2 lines to indent... 
3 lines indented 

This issue is related to #25, and only occurs on Windows. The reason is as follows.

Python 3 uses utf-8 as default encoding, and so does Linux system. The source code passing via PIPE will always be utf-8. But on Windows, it becomes tricky. With the following code, we can check the encoding used in Windows,

import sys
import os

print("Is a tty: {}".format(os.isatty(sys.stdin.fileno())))
print(sys.stdin.encoding)
> python3 test_stdin_encoding.py
Is a tty: True
utf-8

> echo "hello" | python3 test_stdin_encoding.py
Is a tty: False
cp936

Windows system does not always use utf-8(65001) as its default encoding for console. In fact, it rarely use utf-8 as default setting. We can chagne the system setting to force windows to use utf-8, but I think it's beyond this topic.

On the other hand, vim-autoformat always encodes source code in utf-8 and pass it to PIPE, but the formatter program has no idea about the encoding and may assume it is the defualt encoding used by system.

I would like to recommend to get the encoding at run time, and uses it to encode the code. For example,

# L249-250 in autoformat.vim
encoding = sys.stdin.encoding
text = bytes(os.linesep.join(vim.current.buffer[:]) + os.linesep, encoding)

# L276 in autoformat.vim
stdoutdata = stdoutdata.decode(encoding)

This should fix the issue we mentioned above and #25. Besides, it should not play negative effect for other system and encodings.

NoAnyLove avatar Sep 20 '17 01:09 NoAnyLove

Thanks for supplying this information. I wasn't aware of the python code writing in a different encoding than the system default. This indeed needs to be solved, and I will have a look at it when I have time.

chtenb avatar Sep 20 '17 05:09 chtenb

+1 hope fix this quickly please!

https://github.com/google/yapf/issues/449 they already fixed!

NewUserHa avatar Aug 21 '19 08:08 NewUserHa