closure-compiler
closure-compiler copied to clipboard
Piping Closure compiler stderr output to Python with Unicode characters on Windows problem
STR:
a.py
import subprocess
subprocess.run(['npx', 'google-closure-compiler','--charset=UTF8','--js','a.js','--js_output_file','o.js'], encoding='utf-8', stderr=subprocess.PIPE, shell=True)
a.js
if (4 == NaN) console.log('á');
generates an error
C:\emsdk\emscripten\main>python a.py
Traceback (most recent call last):
File "C:\emsdk\emscripten\main\a.py", line 2, in <module>
subprocess.run(['npx', 'google-closure-compiler','--charset=UTF8','--js','a.js','--js_output_file','o.js'], encoding='utf-8', stderr=subprocess.PIPE, shell=True)
File "C:\Python311\Lib\subprocess.py", line 550, in run
stdout, stderr = process.communicate(input, timeout=timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\subprocess.py", line 1197, in communicate
stderr = self.stderr.read()
^^^^^^^^^^^^^^^^^^
File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 135: invalid continuation byte
My impression here is that Closure has emitted the ISO-8859-1 encoding value of á
to stderr, which has the hex value of 0xe1
. However, the encoding='utf-8'
argument in Python expects the stderr to be printed out as UTF-8.
I could not find a command line directive in https://github.com/google/closure-compiler/wiki/Flags-and-Options to help control Closure stdout/stderr output encoding.
Which encoding does Closure use for stdout/stderr printing? Is it ISO-8859-1 by intent? Or should it have been UTF-8 and Closure accidentally printed out ISO-8859-1?