uv
uv copied to clipboard
Support hash checking against `RECORD`
We should validate the hash of each individual file in the wheel against the hash recorded in RECORD. (This is distinct from the hash-checking mode described in https://github.com/astral-sh/puffin/issues/131 and https://github.com/astral-sh/puffin/issues/474.)
During installation or for an existing venv?
install-wheel-rs can do during installation, but it's turned off by default for perf reason (sha256 is slow) and because pip also didn't validate last time i checked.
For an existing venv, https://github.com/konstin/poc-monotrail/blob/main/crates/monotrail/src/verify_installation.rs implements this.
For us, installation is linking, so it should really happen when unzipping into the cache.
Or we could do it after-the-fact as part of our venv validation...
Related topic (let me know if I should create a separate issue):
Currently most python packaging tooling doesn't have any kind of guardrails against the clobbering of files in one whl from another whl. This is basically undefined behavior -- depending on the order the final venv will be different. There are packages in the wild that include things like empty __init__.py files that do clobber each other and it's not a problem in practice (most notably google cloud family of packages). There are also cases when people put README.md into the top-level folder so it ends up in site-packages/README.md and conflicts.
@hauntsaninja proposed that we could do a hash check against the RECORD file entry and maybe have a flag to fail loudly in case of violation -- that should simultaneously allow clobbering of empty __init__.py files and prevent undefined behaviors.
What do you think about this idea? Maybe there is a better way to achieve more strict semantic?