js-beautify
js-beautify copied to clipboard
Python js-beautifier fails when script contains UTF-8, and output is piped
$ cat test.js
var s = "åäö";
$ js-beautify test.js
var s = "åäö";
$ js-beautify test.js | cat -
'ascii' codec can't encode characters in position 9-11: ordinal not in range(128)
$ js-beautify --version
1.6.4
$ python --version
Python 2.7.12
@tobbez - I had a similar issue crop up here: https://github.com/beautify-web/js-beautify/commit/a0bc8c3ecb8b3abcba5d8be2c9a3412171a25ef3#diff-319aee52866e5899a11a7f49987984b4R2302
It would be great help in understanding this if you could do any or all of the following:
- Update to js-beautify 1.6.8 and test again.
- Downgrade to Python 2.7.10 and test again.
- Upgrade to Python 3.x and test again.
$ ./venv27/bin/python --version
Python 2.7.12
$ ./venv27/bin/js-beautify --version
1.6.8
$ ./venv27/bin/js-beautify test.js | cat -
'ascii' codec can't encode characters in position 9-11: ordinal not in range(128)
$ ./venv35/bin/python --version
Python 3.5.2
$ ./venv35/bin/js-beautify --version
1.6.8
$ ./venv35/bin/js-beautify test.js | cat -
var s = "åäö";
$ ./py279/bin/python --version
Python 2.7.9
$ ./py279/bin/js-beautify --version
1.6.8
$ ./py279/bin/js-beautify test.js | cat -
'ascii' codec can't encode characters in position 9-11: ordinal not in range(128)
I don't have easy access to 2.7.10, but I don't think this is a regression in Python 2.7 anyway
@tobbez - Thanks for your fast response.
@tobbez - If you can provide any help fixing this, that would be great.
After some research, the best way is likely to add this at the start of the cli program:
import sys
import codecs
import locale
# Wrap stdout when output from the program is redirected
if sys.stdout.encoding is None:
try:
# Python 3
sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout.detach())
except:
# Python 2
sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout)
In Python 2, we create a writer connected to sys.stdout
that encodes to the preferred encoding, and assigns that to sys.stdout
.
In Python 3, we need to additionally retrieve the underlaying raw stream (using detach
), and connect the writer to that. This is because sys.stdout
in Python 3 expects str
, while the writer writes bytes
.
The Python 3 approach is tried first, because detach
was first introduced in Python 3.1, so a raised exception means the code is running on Python 2.
(The code above should obviously never be included in a library.)
Thanks for doing the research for this, that helps a lot.
You said: "(The code above should obviously never be included in a library.)" Could you explain? Currently the cli part of the code is not separate from the rest of the code in the Python implementation.
If you include it inside a library, users of that library will have their stdout
replaced behind their backs just by importing the library, which may lead to unintended side effects and/or errors that are hard to track down.
@tobbez - Oh, you mean putting that if
block right at the top of the file. Got it.
No, I'd make that the last step right before writing to stdout. 😄
Putting it right at the top of the file is fine as long as it's an application (and was actually intended for that).
Anyway, I didn't look too closely at the existing code before writing the above, but doing so makes it clear that the above code is much more general than is needed here. That solution is useful when writing to standard out in many places, but seeing as js-beautifier only writes output in a single place, the following should suffice:
diff --git a/python/jsbeautifier/__init__.py b/python/jsbeautifier/__init__.py
index 9eecb59..3697513 100644
--- a/python/jsbeautifier/__init__.py
+++ b/python/jsbeautifier/__init__.py
@@ -7,6 +7,7 @@ import re
import string
import errno
import copy
+import locale
from jsbeautifier.__version__ import __version__
#
@@ -2316,7 +2317,16 @@ def main():
import msvcrt
msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
- sys.stdout.write(pretty)
+ if sys.stdout.encoding is None:
+ # sys.stdout.encoding will be None if stdout is redirected (at
+ # least on Python 2), leading to encoding errors when stdout is
+ # redirected and the beautified content contains unicode
+ try:
+ sys.stdout.buffer.write(pretty.encode(locale.getpreferredencoding()))
+ except:
+ sys.stdout.write(pretty.encode(locale.getpreferredencoding()))
+ else:
+ sys.stdout.write(pretty)
else:
if isFileDifferent(outfile, pretty):
mkdir_p(os.path.dirname(outfile))
You're about 90% of the way having written the PR at this point. 😄
All that's needed adding file with the contents you've shown to js/test/resources
And then adding a couple tests test to python/jsbeautifier/tests/shell-smoke-test.sh and js/test/shell-smoke-test.sh
It's cool if you don't have time to do the last bit - I'll get to it soon. Thanks again!
Hey @bitwiseman, did this get fixed by chance? I looked through recent changes here https://github.com/beautify-web/js-beautify/commits/master but didn't find any relevant ones.
I'm on: EditorConfig (0.12.1) jsbeautifier (1.6.12)
I tested it on Python 2.7.9 thru 2.7.13 on both brew and pkg's to try to get it to work, only to find that the bug seems to manifest always when redirecting the output, regardless of version.
It works as intended if I do not redirect the output (and the program outputs to stdout), but as soon as I pipe or redirect the output to anything, the program shatters (the output does not get correctly redirected) and emits to stderr a message such as:
'ascii' codec can't encode character u'\u....' in position 376942: ordinal not in range(128)
This makes using js-beautify an excellent adventure in copy/paste when beautifying larger files.
And just now reading @tobbez's above proposed change which I somehow failed to read before debugging it, that solution should fix it exactly.
Bump :)
the bug is still there: js-beautifier -o test.b.js test.js works
js-beautifier >test.b.js test.js gives the error.