uv icon indicating copy to clipboard operation
uv copied to clipboard

Support hash checking against `RECORD`

Open charliermarsh opened this issue 1 year ago • 4 comments

We should validate the hash of each individual file in the wheel against the hash recorded in RECORD. (This is distinct from the hash-checking mode described in https://github.com/astral-sh/puffin/issues/131 and https://github.com/astral-sh/puffin/issues/474.)

charliermarsh avatar Jan 11 '24 17:01 charliermarsh

During installation or for an existing venv?

install-wheel-rs can do during installation, but it's turned off by default for perf reason (sha256 is slow) and because pip also didn't validate last time i checked.

For an existing venv, https://github.com/konstin/poc-monotrail/blob/main/crates/monotrail/src/verify_installation.rs implements this.

konstin avatar Jan 11 '24 17:01 konstin

For us, installation is linking, so it should really happen when unzipping into the cache.

charliermarsh avatar Jan 11 '24 17:01 charliermarsh

Or we could do it after-the-fact as part of our venv validation...

charliermarsh avatar Jan 11 '24 17:01 charliermarsh

Related topic (let me know if I should create a separate issue):

Currently most python packaging tooling doesn't have any kind of guardrails against the clobbering of files in one whl from another whl. This is basically undefined behavior -- depending on the order the final venv will be different. There are packages in the wild that include things like empty __init__.py files that do clobber each other and it's not a problem in practice (most notably google cloud family of packages). There are also cases when people put README.md into the top-level folder so it ends up in site-packages/README.md and conflicts.

@hauntsaninja proposed that we could do a hash check against the RECORD file entry and maybe have a flag to fail loudly in case of violation -- that should simultaneously allow clobbering of empty __init__.py files and prevent undefined behaviors.

What do you think about this idea? Maybe there is a better way to achieve more strict semantic?

vors avatar Aug 13 '24 19:08 vors