virtualenv
virtualenv copied to clipboard
Interpreter discovery bug wrt. Microsoft Store shortcut
Issue
hatch is using virtualenvs interpreter discovery during creation of its virtual envs. The discovery also finds the Microsoft Store python shortcut. Even though the interpreter was not installed using the MS Store, this executable is used during discovery to run virtualenvs py_info.py script. In this setting, hatch is able to successfully create its venv (read: exit code 0), but the discovery returns a bunch of UnicodeDecodeErrors and spills them on the terminal ☹️
hatch env create
Exception in thread Thread-6 (_readerthread):
Traceback (most recent call last):
File "C:\Users\axel-kah\AppData\Local\pyapp\data\hatch\5730184961401994386\1.13.0\python\Lib\threading.py", line 1073, in _bootstrap_inner
self.run()
File "C:\Users\axel-kah\AppData\Local\pyapp\data\hatch\5730184961401994386\1.13.0\python\Lib\threading.py", line 1010, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\axel-kah\AppData\Local\pyapp\data\hatch\5730184961401994386\1.13.0\python\Lib\subprocess.py", line 1599, in _readerthread
buffer.append(fh.read())
^^^^^^^^^
File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 38: invalid start byte
Exception in thread Thread-8 (_readerthread):
Traceback (most recent call last):
File "C:\Users\axel-kah\AppData\Local\pyapp\data\hatch\5730184961401994386\1.13.0\python\Lib\threading.py", line 1073, in _bootstrap_inner
self.run()
File "C:\Users\axel-kah\AppData\Local\pyapp\data\hatch\5730184961401994386\1.13.0\python\Lib\threading.py", line 1010, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\axel-kah\AppData\Local\pyapp\data\hatch\5730184961401994386\1.13.0\python\Lib\subprocess.py", line 1599, in _readerthread
buffer.append(fh.read())
^^^^^^^^^
File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 38: invalid start byte
The root cause seems to be that virtualenv is spawning a subprocess for each interpreter it finds and has it execute the py_info.py script. On Windows this will also try the same with the "mysterious" C:\Users\axel-kah\AppData\Local\Microsoft\WindowsApps\python.exe. If python was not installed using MS Store, then this executable will return an error message using the infamous cp1252 encoding. When the OS is set to using a language like german, then this error message will contain german umlauts like ü which result in the UnicodeDecodeErrors.
Proposed Fix
Change the encoding to cp1252 when on windows when launching the subprocesses during discovery, instead of using utf-8 for all platforms.
I have verified the fix by locally patching a dev install of hatch and could submit a PR.
Environment
Provide at least:
- OS: win11 (german language(!))
hatch1.13.0virtualenv20.28.0pip listof the host python wherevirtualenvis installed:
"C:\Users\axel-kah\AppData\Local\pyapp\data\hatch\5730184961401994386\1.13.0\python\python.exe" -m pip list
Package Version
----------------- ----------
anyio 4.6.2
certifi 2024.8.30
click 8.1.7
colorama 0.4.6
distlib 0.3.9
filelock 3.16.1
h11 0.14.0
hatch 1.13.0
hatchling 1.25.0
httpcore 1.0.6
httpx 0.27.2
hyperlink 21.0.0
idna 3.10
jaraco.classes 3.4.0
jaraco.context 6.0.1
jaraco.functools 4.1.0
keyring 25.4.1
markdown-it-py 3.0.0
mdurl 0.1.2
more-itertools 10.5.0
packaging 24.1
pathspec 0.12.1
pexpect 4.9.0
pip 24.0
platformdirs 4.3.6
pluggy 1.5.0
ptyprocess 0.7.0
Pygments 2.18.0
pywin32-ctypes 0.2.3
rich 13.9.2
setuptools 69.1.0
shellingham 1.5.4
sniffio 1.3.1
tomli_w 1.1.0
tomlkit 0.13.2
trove-classifiers 2024.10.13
userpath 1.9.2
uv 0.4.20
virtualenv 20.26.6
zstandard 0.23.0
Output of the virtual environment creation
Not applicable because venv is created implicitly by hatch.
I came here from pyenv-win. Mind trying a different patch?
Keep the encoding as utf-8, but also pass errors="backslashreplace". (see https://docs.python.org/3/library/codecs.html#error-handlers and https://docs.python.org/3/library/subprocess.html#popen-constructor)
Keep the encoding as
utf-8, but also passerrors="backslashreplace"
Seems to work just as well. Maybe Bernát can make a call on how he would like to have this handled, once he's back.
Hi @axel-kah,
Here's a suggested solution for this issue:
# in virtualenv/discovery/cached_py_info.py
import sys
import subprocess
from subprocess import PIPE
... other imports
def from_exe(exe, env=None, raise_on_error=True):
"""
Given a python executable, get the python information as a dictionary
"""
cmd = [exe, "-c", PY_INFO_CODE]
env = env or {}
# NEW: Set encoding to cp1252 on Windows
encoding = "cp1252" if sys.platform == "win32" else "utf-8"
try:
process = subprocess.Popen(cmd, stdout=PIPE, stderr=PIPE, env=env) # MODIFIED
except OSError as os_error:
if raise_on_error:
raise
return {"error": os_error}
out, err = process.communicate()
try: # MODIFIED
out = out.decode(encoding)
err = err.decode(encoding)
except UnicodeDecodeError: # catch the exception and return meaningful error information
return {"error": f"Failed to decode output using {encoding} encoding: {out!r}, {err!r}"}
# ... rest of the function (unchanged)
Explanation:
The original code in virtualenv's cached_py_info.py uses UTF-8 encoding to decode the output of the subprocess that runs py_info.py. However, the Microsoft Store python stub executable, when invoked incorrectly, outputs error messages in CP1252 encoding on Windows. This mismatch causes the UnicodeDecodeError.
The solution changes the decoding to use CP1252 on Windows. The modified line specifically sets the encoding variable based on the platform:
encoding = "cp1252" if sys.platform == "win32" else "utf-8"
Then, the output of the subprocess is decoded using this dynamically determined encoding:
try:
out = out.decode(encoding)
err = err.decode(encoding)
except UnicodeDecodeError: # Handles potential issues even with cp1252
return {"error": f"Failed to decode output using {encoding} encoding: {out!r}, {err!r}"}
This allows the code to correctly handle the output from the Microsoft Store Python stub, even if it contains characters not representable in UTF-8. The try...except block is also added to catch potential UnicodeDecodeError even with cp1252 and return a more informative error message in such cases. This adds a layer of robustness to the decoding process.
This fix targets the root cause of the issue within virtualenv itself, ensuring that the interpreter discovery process can correctly handle the output from the Microsoft Store Python stub and prevents the UnicodeDecodeError from occurring. The change is localized and doesn't affect the behavior of virtualenv on other platforms.
However, the Microsoft Store python stub executable, when invoked incorrectly, outputs error messages in CP1252 encoding on Windows
Is that true in all locales, or does it actually use the appropriate locale encoding (which would be more in line with general Windows practice)? Would it be better here to use the locale encoding?
On the other hand, if we don’t care about non-ASCII bytes in the output, and all that matters is to avoid decode errors, any encoding for which all bytes are valid would work. In that case, CP1252 should be fine, although Latin-1 is conventionally used because it maps all bytes to the same code point.