js-beautify icon indicating copy to clipboard operation
js-beautify copied to clipboard

Python js-beautifier fails when script contains UTF-8, and output is piped

Open tobbez opened this issue 8 years ago • 12 comments

$ cat test.js 
var s = "åäö";
$ js-beautify test.js
var s = "åäö";
$ js-beautify test.js | cat -
'ascii' codec can't encode characters in position 9-11: ordinal not in range(128)
$ js-beautify --version
1.6.4
$ python --version
Python 2.7.12

tobbez avatar Jan 04 '17 09:01 tobbez

@tobbez - I had a similar issue crop up here: https://github.com/beautify-web/js-beautify/commit/a0bc8c3ecb8b3abcba5d8be2c9a3412171a25ef3#diff-319aee52866e5899a11a7f49987984b4R2302

It would be great help in understanding this if you could do any or all of the following:

  1. Update to js-beautify 1.6.8 and test again.
  2. Downgrade to Python 2.7.10 and test again.
  3. Upgrade to Python 3.x and test again.

bitwiseman avatar Jan 04 '17 19:01 bitwiseman

$ ./venv27/bin/python --version
Python 2.7.12
$ ./venv27/bin/js-beautify --version
1.6.8
$ ./venv27/bin/js-beautify test.js | cat -
'ascii' codec can't encode characters in position 9-11: ordinal not in range(128)

$ ./venv35/bin/python --version
Python 3.5.2
$ ./venv35/bin/js-beautify --version
1.6.8
$ ./venv35/bin/js-beautify test.js | cat -
var s = "åäö";

$ ./py279/bin/python --version
Python 2.7.9
$ ./py279/bin/js-beautify --version
1.6.8
$ ./py279/bin/js-beautify test.js | cat -
'ascii' codec can't encode characters in position 9-11: ordinal not in range(128)

I don't have easy access to 2.7.10, but I don't think this is a regression in Python 2.7 anyway

tobbez avatar Jan 04 '17 19:01 tobbez

@tobbez - Thanks for your fast response.

bitwiseman avatar Jan 05 '17 00:01 bitwiseman

@tobbez - If you can provide any help fixing this, that would be great.

bitwiseman avatar Jan 19 '17 20:01 bitwiseman

After some research, the best way is likely to add this at the start of the cli program:

import sys
import codecs
import locale

# Wrap stdout when output from the program is redirected
if sys.stdout.encoding is None:
    try:
        # Python 3
        sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout.detach())
    except:
        # Python 2
        sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout)

In Python 2, we create a writer connected to sys.stdout that encodes to the preferred encoding, and assigns that to sys.stdout.

In Python 3, we need to additionally retrieve the underlaying raw stream (using detach), and connect the writer to that. This is because sys.stdout in Python 3 expects str, while the writer writes bytes.

The Python 3 approach is tried first, because detach was first introduced in Python 3.1, so a raised exception means the code is running on Python 2.

(The code above should obviously never be included in a library.)

tobbez avatar Jan 20 '17 14:01 tobbez

Thanks for doing the research for this, that helps a lot.

You said: "(The code above should obviously never be included in a library.)" Could you explain? Currently the cli part of the code is not separate from the rest of the code in the Python implementation.

bitwiseman avatar Jan 20 '17 17:01 bitwiseman

If you include it inside a library, users of that library will have their stdout replaced behind their backs just by importing the library, which may lead to unintended side effects and/or errors that are hard to track down.

tobbez avatar Jan 20 '17 23:01 tobbez

@tobbez - Oh, you mean putting that if block right at the top of the file. Got it. No, I'd make that the last step right before writing to stdout. 😄

bitwiseman avatar Jan 20 '17 23:01 bitwiseman

Putting it right at the top of the file is fine as long as it's an application (and was actually intended for that).

Anyway, I didn't look too closely at the existing code before writing the above, but doing so makes it clear that the above code is much more general than is needed here. That solution is useful when writing to standard out in many places, but seeing as js-beautifier only writes output in a single place, the following should suffice:

diff --git a/python/jsbeautifier/__init__.py b/python/jsbeautifier/__init__.py
index 9eecb59..3697513 100644
--- a/python/jsbeautifier/__init__.py
+++ b/python/jsbeautifier/__init__.py
@@ -7,6 +7,7 @@ import re
 import string
 import errno
 import copy
+import locale
 from jsbeautifier.__version__ import __version__

 #
@@ -2316,7 +2317,16 @@ def main():
                 import msvcrt
                 msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)

-            sys.stdout.write(pretty)
+            if sys.stdout.encoding is None:
+                # sys.stdout.encoding will be None if stdout is redirected (at
+                # least on Python 2), leading to encoding errors when stdout is
+                # redirected and the beautified content contains unicode
+                try:
+                    sys.stdout.buffer.write(pretty.encode(locale.getpreferredencoding()))
+                except:
+                    sys.stdout.write(pretty.encode(locale.getpreferredencoding()))
+            else:
+                sys.stdout.write(pretty)
         else:
             if isFileDifferent(outfile, pretty):
                 mkdir_p(os.path.dirname(outfile))

tobbez avatar Jan 20 '17 23:01 tobbez

You're about 90% of the way having written the PR at this point. 😄

All that's needed adding file with the contents you've shown to js/test/resources

And then adding a couple tests test to python/jsbeautifier/tests/shell-smoke-test.sh and js/test/shell-smoke-test.sh

It's cool if you don't have time to do the last bit - I'll get to it soon. Thanks again!

bitwiseman avatar Jan 21 '17 00:01 bitwiseman

Hey @bitwiseman, did this get fixed by chance? I looked through recent changes here https://github.com/beautify-web/js-beautify/commits/master but didn't find any relevant ones.

I'm on: EditorConfig (0.12.1) jsbeautifier (1.6.12)

I tested it on Python 2.7.9 thru 2.7.13 on both brew and pkg's to try to get it to work, only to find that the bug seems to manifest always when redirecting the output, regardless of version.

It works as intended if I do not redirect the output (and the program outputs to stdout), but as soon as I pipe or redirect the output to anything, the program shatters (the output does not get correctly redirected) and emits to stderr a message such as:

'ascii' codec can't encode character u'\u....' in position 376942: ordinal not in range(128)

This makes using js-beautify an excellent adventure in copy/paste when beautifying larger files.

And just now reading @tobbez's above proposed change which I somehow failed to read before debugging it, that solution should fix it exactly.

Bump :)

fwhite-wsm avatar Apr 19 '17 04:04 fwhite-wsm

the bug is still there: js-beautifier -o test.b.js test.js works

js-beautifier >test.b.js test.js gives the error.

Zibri avatar Dec 17 '23 18:12 Zibri