pipx
pipx copied to clipboard
Reason behind pipx run forcing the subprocess' encoding to utf-8?
Hi. Today I stumbled upon a problem with pipx run, where the python tool to run prints German Umlauts like "ü". Those didn't show up correctly in the terminal, although I knew that they did when I ran the raw python scripts without pipx wrapped around.
Turns out that the reason for this lies in pipx.util, where _fix_subprocess_env sets
env["PYTHONIOENCODING"] = "utf-8"
env["PYTHONLEGACYWINDOWSSTDIO"] = "utf-8"
into the environment inherited to the subprocess, and then exec_app sets
subprocess.run( ..., encoding="utf-8")
alongside to match that.
The problem is that my German Windows terminal (cmd.exe) is not UTF-8 but CP850, therefore anything coming as utf8 from Python looks like gibberish in my terminal.
I'd like to know if there was a specific reason behind forcing the encoding here, or if anything speaks against just leaving these settings away so that Python can detect and use the encoding of the terminal, which works nicely in my case.
Thanks and cheers.
Hmm yes, there is https://github.com/pypa/pipx/pull/335#discussion_r366156303 and context.
Thanks for the link @chrysle . Unfortunately to me the commit message doesn't make clear why this was added, and it doesn't seem related to the issue that it fixes. To make the whole thing a little more graspable:
pipx run cowsay -t "hello äöü"
prints
_________
| hello ├ñ├Â├╝ |
=========
\
\
^__^
(oo)\_______
(__)\ )\/\
||----w |
|| ||
on my machine (Win10, cmd.exe in Windows Terminal, chcp says 850, pipx 1.5.0, Python 3.11.5), whereas just
cowsay -t "hello äöü"
in the same terminal prints everything correctly. And removing the above lines related to the subprocess encoding fixes this.
Unfortunately to me the commit message doesn't make clear why this was added, and it doesn't seem related to the issue that it fixes.
As stated in https://github.com/pypa/pipx/pull/335#discussion_r366164868, this was added to prevent any edge cases that might occur otherwise – normally, you're on the safe side with UTF-8 encoding, because it's that widespread. But I agree the behaviour you experience is unpleasant. Probably, we should make pipx's output encoding configurable, with an environment variable prefixed PIPX_ to avoid any unintended behaviour originating from user-specified PYTHONIOENCODING.
@J3ronimo Is the problem with subprocess encoding the same as #1358 (Maybe one could be closed as a dupe, but the issue description you wrote is more helpful to understand the problem.)
Pipx should work with Windows 10 default OEM codepage and encoding, not assuming the console window's codepage is UTF-8, as most users will not have changed the default.
My powershell can print emoji to the screen, python can print emoji, but pipx can't until I change OutputEncoding:
$ [console]::OutputEncoding.BodyName
ibm437
$ cat a.py
print("💩")
$ python a.py
💩
$ pipx run a.py
💩
$ [console]::OutputEncoding = [System.Text.Encoding]::UTF8
$ pipx run a.py
💩
I'm not confident the issue is directly related to #335 because just changing the args to subprocess.run I can't reproduce the problem:
>>> import subprocess
>>> subprocess.run(["python", "a.py"])
💩
CompletedProcess(args=['python', 'a.py'], returncode=0)
>>> subprocess.run(["python", "a.py"], encoding="utf-8")
💩
CompletedProcess(args=['python', 'a.py'], returncode=0)
>>> import os
>>> env = dict(os.environ)
>>> env["PYTHONIOENCODING"] = "utf-8"
>>> subprocess.run(["python", "a.py"], env=env, encoding="utf-8")
💩
CompletedProcess(args=['python', 'a.py'], returncode=0)
>>> subprocess.run([r"C:\Users\cwalsh\scoop\shims\pipx.cmd", "run", "a.py"])
💩
CompletedProcess(args=['C:\\Users\\cwalsh\\scoop\\shims\\pipx.cmd', 'run', 'a.py'], returncode=0)
>>> exit()
$ python C:\Users\cwalsh\scoop\apps\pipx\current\pipx.pyz run a.py
💩