Use system default encoding when passing code to PIPE
On Windows system, if the source code contains non-ascii characters, the autopep8 and yapf will fail to format the code. For example,
# coding: utf-8
print("中文")
Format above code with autopep8, it outputs
"format_test.py" 3L, 38C
Trying definition from g:formatdef_autopep8
Evaluated formatprg: autopep8 - --max-line-length=80
Using python 3 code...
Formatter autopep8 has errors: b'Traceback (most recent call last):\r\n File "e:\\python36\\lib\\runpy.py", line 193, in _run_module_as_main\r\n "__main__", mod_spec)\r\n File "e:\\python36\\lib\\runpy.py", line 85, in _run_code\r\n exec(code, run_globals)\r\n File "E:\\Python36\\Scripts\\autopep8.exe\\__main__.py", line 9, in <module>\r\n File "e:\\python36\\lib\\site-packages\\autopep8.py", line 3803, in main\r\n fix_code(sys.stdin.read(), args, encoding=encoding))\r\nUnicodeEncodeError: \'gbk\' codec can\'t encode character \'\\udcad\' in position 25: illegal multibyte sequence\r\n'
Definition in 'g:formatdef_autopep8' was unsuccessful.
No format definitions were successful.
Removing trailing whitespace...
Retabbing...
Autoindenting...
2 lines to indent...
3 lines indented
and yapf outputs,
Trying definition from g:formatdef_yapf
Evaluated formatprg: yapf --style="{based_on_style:pep8,indent_width:4,column_limit:80}" -l 1-3
Using python 3 code...
Formatter yapf has errors: b'Traceback (most recent call last):\r\n File "e:\\python36\\lib\\runpy.py", line 193, in _run_module_as_main\r\n "__main__", mod_spec)\r\n File "e:\\python36\\lib\\runpy.py", line 85, in _run_code\r\n exec(code, run_globals)\r\n File "E:\\Python36\\Scripts\\yapf.exe\\__main__.py", line 9, in <module>\r\n File "e:\\python36\\lib\\site-packages\\yapf\\__init__.py", line 306, in run_main\r\n sys.exit(main(sys.argv))\r\n File "e:\\python36\\lib\\site-packages\\yapf\\__init__.py", line 177, in main\r\n file_resources.WriteReformattedCode(\'<stdout>\', reformatted_source)\r\n File "e:\\python36\\lib\\site-packages\\yapf\\yapflib\\file_resources.py", line 99, in WriteReformattedCode\r\n py3compat.EncodeAndWriteToStdout(reformatted_code)\r\n File "e:\\python36\\lib\\site-packages\\yapf\\yapflib\\py3compat.py", line 80, in EncodeAndWriteToStdout\r\n sys.stdout.buffer.write(s.encode(encoding))\r\nUnicodeEncodeError: \'utf-8\' codec can\'t encode character \'\\udcad\' in position 25: surrogates not allowed\r\n'
Definition in 'g:formatdef_yapf' was unsuccessful.
No format definitions were successful.
Removing trailing whitespace...
Retabbing...
Autoindenting...
2 lines to indent...
3 lines indented
This issue is related to #25, and only occurs on Windows. The reason is as follows.
Python 3 uses utf-8 as default encoding, and so does Linux system. The source code passing via PIPE will always be utf-8. But on Windows, it becomes tricky. With the following code, we can check the encoding used in Windows,
import sys
import os
print("Is a tty: {}".format(os.isatty(sys.stdin.fileno())))
print(sys.stdin.encoding)
> python3 test_stdin_encoding.py
Is a tty: True
utf-8
> echo "hello" | python3 test_stdin_encoding.py
Is a tty: False
cp936
Windows system does not always use utf-8(65001) as its default encoding for console. In fact, it rarely use utf-8 as default setting. We can chagne the system setting to force windows to use utf-8, but I think it's beyond this topic.
On the other hand, vim-autoformat always encodes source code in utf-8 and pass it to PIPE, but the formatter program has no idea about the encoding and may assume it is the defualt encoding used by system.
I would like to recommend to get the encoding at run time, and uses it to encode the code. For example,
# L249-250 in autoformat.vim
encoding = sys.stdin.encoding
text = bytes(os.linesep.join(vim.current.buffer[:]) + os.linesep, encoding)
# L276 in autoformat.vim
stdoutdata = stdoutdata.decode(encoding)
This should fix the issue we mentioned above and #25. Besides, it should not play negative effect for other system and encodings.
Thanks for supplying this information. I wasn't aware of the python code writing in a different encoding than the system default. This indeed needs to be solved, and I will have a look at it when I have time.
+1 hope fix this quickly please!
https://github.com/google/yapf/issues/449 they already fixed!