closure-compiler icon indicating copy to clipboard operation
closure-compiler copied to clipboard

Piping Closure compiler stderr output to Python with Unicode characters on Windows problem

Open juj opened this issue 3 months ago • 6 comments

STR:

a.py

import subprocess
subprocess.run(['npx', 'google-closure-compiler','--charset=UTF8','--js','a.js','--js_output_file','o.js'], encoding='utf-8', stderr=subprocess.PIPE, shell=True)

a.js

if (4 == NaN) console.log('á');

generates an error

C:\emsdk\emscripten\main>python a.py
Traceback (most recent call last):
  File "C:\emsdk\emscripten\main\a.py", line 2, in <module>
    subprocess.run(['npx', 'google-closure-compiler','--charset=UTF8','--js','a.js','--js_output_file','o.js'], encoding='utf-8', stderr=subprocess.PIPE, shell=True)
  File "C:\Python311\Lib\subprocess.py", line 550, in run
    stdout, stderr = process.communicate(input, timeout=timeout)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\subprocess.py", line 1197, in communicate
    stderr = self.stderr.read()
             ^^^^^^^^^^^^^^^^^^
  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 135: invalid continuation byte

My impression here is that Closure has emitted the ISO-8859-1 encoding value of á to stderr, which has the hex value of 0xe1. However, the encoding='utf-8' argument in Python expects the stderr to be printed out as UTF-8.

I could not find a command line directive in https://github.com/google/closure-compiler/wiki/Flags-and-Options to help control Closure stdout/stderr output encoding.

Which encoding does Closure use for stdout/stderr printing? Is it ISO-8859-1 by intent? Or should it have been UTF-8 and Closure accidentally printed out ISO-8859-1?

juj avatar Mar 06 '24 11:03 juj