pip icon indicating copy to clipboard operation
pip copied to clipboard

`pip wheel` produces a "Hashes are required" when building a wheel from a local sdist

Open alex opened this issue 1 year ago • 37 comments

Description

An innovation like the following: pip wheel -c constraints-file-with-hashes.txt local-sdist.tar.gz produces an error like:

ERROR: Hashes are required in --require-hashes mode, but they are missing from some requirements. Here is a list of those requirements along with the hashes their downloaded archives actually had. Add lines like these to your requirements files to prevent tampering. (If you did not enable --require-hashes manually, note that it turns on automatically when any package has a hash.)
    file:///D:/a/cryptography/cryptography/cryptography-44.0.0.dev1.tar.gz --hash=sha256:e85c67eb1a045652bb850f443ae24004b618aca6df8c642a8e7a977f90f16afb

Note that the package which is missing the hash is the local sdist.

Expected behavior

pip should enforce hashes for any downloaded/remote packages, but should not require a hash for the local sdist.

pip version

24.2

Python version

3.11.9

OS

All

How to Reproduce

  1. Download a local sdist pip download --no-binary --no-deps cryptography
  2. Create a constraints file with hashes
  3. pip wheel -c constraints-file-with-hashes.txt cryptography*.tar.gz

Output

No response

Code of Conduct

alex avatar Aug 28 '24 12:08 alex

Does the constraints file contain --require-hashes?

notatallshaw avatar Aug 28 '24 23:08 notatallshaw

Yes

On Wed, Aug 28, 2024, 7:18 PM Damian Shaw @.***> wrote:

Does the constraints file contain --require-hashes?

— Reply to this email directly, view it on GitHub https://github.com/pypa/pip/issues/12942#issuecomment-2316395523, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAGBGQDUMASKHYXPZC4RTZTZLDBAVCNFSM6AAAAABNIEIQSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJWGM4TKNJSGM . You are receiving this because you authored the thread.Message ID: @.***>

alex avatar Aug 28 '24 23:08 alex

Sorry, to be more precise, the constraints file contains hashes. It doesn't have --require-hashes, but I believe those are equivalent.

alex avatar Aug 29 '24 19:08 alex

Is it possible to have a self-contained reprod?

uranusjr avatar Aug 30 '24 03:08 uranusjr

(tempenv-6e562856703b6) ~/.v/tempenv-6e562856703b6 ❯❯❯ pip download --no-binary :all: --no-deps pretend
Collecting pretend
  File was already downloaded /Users/alex_gaynor/.virtualenvs/tempenv-6e562856703b6/pretend-1.0.9.tar.gz
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Successfully downloaded pretend
(tempenv-6e562856703b6) ~/.v/tempenv-6e562856703b6 ❯❯❯ pip wheel --require-hashes ./pretend-1.0.9.tar.gz
Processing ./pretend-1.0.9.tar.gz
  File was already downloaded /Users/alex_gaynor/.virtualenvs/tempenv-6e562856703b6/pretend-1.0.9.tar.gz
ERROR: Hashes are required in --require-hashes mode, but they are missing from some requirements. Here is a list of those requirements along with the hashes their downloaded archives actually had. Add lines like these to your requirements files to prevent tampering. (If you did not enable --require-hashes manually, note that it turns on automatically when any package has a hash.)
    file:///Users/alex_gaynor/.virtualenvs/tempenv-6e562856703b6/pretend-1.0.9.tar.gz --hash=sha256:c90eb810cde8ebb06dafcb8796f9a95228ce796531bc806e794c2f4649aa1b10

alex avatar Aug 30 '24 03:08 alex

pip wheel --require-hashes ./pretend-1.0.9.tar.gz

In this case the local sdist does not have a hash, so pip’s complaint is not groundless (whether a local sdist needs a hash is another question). I thought in the original issue the local sdist does have a hash but pip fails to recognise it?

uranusjr avatar Aug 30 '24 04:08 uranusjr

No, there's no hash for the local sdist.

Personally, I find it surprising and unexpected that a hash is required for a local file.

On Fri, Aug 30, 2024 at 12:45 AM Tzu-ping Chung @.***> wrote:

pip wheel --require-hashes ./pretend-1.0.9.tar.gz

In this case the local sdist does not have a hash, so pip’s complaint is not groundless (whether a local sdist needs a hash is another question). I thought in the original issue the local sdist does have a hash but pip fails to recognise it?

— Reply to this email directly, view it on GitHub https://github.com/pypa/pip/issues/12942#issuecomment-2320032362, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAGBE5FIFF7L2LD5ZIZTDZT72GZAVCNFSM6AAAAABNIEIQSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRQGAZTEMZWGI . You are receiving this because you authored the thread.Message ID: @.***>

-- All that is necessary for evil to succeed is for good people to do nothing.

alex avatar Aug 30 '24 11:08 alex

While I can see the argument one way or the other, I have worked in teams where the repository is a network attached storage, and if they were concerned about data integrity and hashed their contents before copying it to the network, they would expect --requires-hashes to enforce that check.

How practical this scenario is I don’t know, but usually every feature of pip is relied on by someone.

notatallshaw avatar Aug 30 '24 13:08 notatallshaw

I'm reticent to suggest a flag, since that's just Yet Another Thing for users. But if the existing behavior is desired, then maybe this is a feature request for some way to disable this behavior, and only enforce hashes for PyPI packages.

alex avatar Aug 30 '24 13:08 alex

only enforce hashes for PyPI packages

To clarify what you intend here, do you mean just PyPI, or any index (specified via --index-url and/or --extra-index-url)? What about "informal" repositories specified via --find-links? Also, there's the possibility of requirements specified by (local) file path or URL - would both of those be exempt from hash checks?

I sympathise with the idea that hashes are more important for some sources than for others, but I'm not at all clear where we draw the line - and I don't personally use hashes, so I have to be guided by what our users seem to want, which mostly feels like "hashes enforced everywhere, except for the occasional place that I don't want them to be enforced"[^1].

[^1]: Sorry, that comes across as a bit facetious or dismissive, but it's genuinely hard to pin down a clear rule that people agree with.

pfmoore avatar Aug 30 '24 14:08 pfmoore

Sorry, I should have said index-provided.

I don't know what to do about the general case of local file path.

It seems clear (to me at least?) that "the sdist I'm building a wheel out of" is a distinct case than the more general pip install use case.

alex avatar Aug 30 '24 14:08 alex

To understand your workflow a bit better, why are you using --requires-hashes with pip wheel for a local sdist?

If your goal is to build the sdist and you're not worried about its data integrity, why not just build it without resolving its dependencies? i.e. drop --requires-hashes and add --no-deps.

notatallshaw avatar Aug 30 '24 14:08 notatallshaw

Because I want to pin the versions of build-system.requires. In my actual use case, the sdist is produced by a previous step in the CI system. See https://github.com/pyca/cryptography/pull/11500

alex avatar Aug 30 '24 14:08 alex

I wasn't aware that pip wheel passed -c constraints to the build requirements, is that correct? It doesn't for pip install, you have to use the environmental variable PIP_CONSTRAINT.

If I am reading this correctly, you are building the wheel(s), copying the wheel(s) to another location to be used, and then venv is not used further? Perhaps as a workaround, you could:

  1. Install your pinned build dependencies: python -m pip install --require-hashes -r ${{ env.BUILD_REQUIREMENTS_PATH }}
  2. Build your wheel(s) offline with no isolation: python -m pip wheel -v --no-deps --no-index --no-build-isolation cryptography*.tar.gz $PY_LIMITED_API -w dist/

I do something very similar for one of my build steps, I don't need build isolation because I can already create a reproducible pinned build environment via the docker steps, and I'm not reusing that environment for anything else. Hope this helps anyway.

notatallshaw avatar Aug 30 '24 14:08 notatallshaw

I suppose it's possible -c doesn't work and I need to use PIP_CONSTRAINT, but that seems orthogonal to this. (It's also a fairly significant foot-gun, but that's also orthogonal!)

I agree that it's possible to work around this by simply not relying on build isolation, but this increases the complexity of the build. It really should be possible to build a wheel from an sdist while exerting precise control over all dependencies to be built. (If there's a better tool than pip for this, I'm happy to hear it, but I'm not aware of another.)

alex avatar Aug 30 '24 15:08 alex

(If there's a better tool than pip for this, I'm happy to hear it, but I'm not aware of another.)

If you're building a wheel, rather than installing, build might be better for you.

pfmoore avatar Aug 30 '24 15:08 pfmoore

build has no way to take an sdist as input, AFAICT.

On Fri, Aug 30, 2024 at 11:22 AM Paul Moore @.***> wrote:

(If there's a better tool than pip for this, I'm happy to hear it, but I'm not aware of another.)

If you're building a wheel, rather than installing, build https://pypi.org/project/build/ might be better for you.

— Reply to this email directly, view it on GitHub https://github.com/pypa/pip/issues/12942#issuecomment-2321595644, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAGBFDF3RD4XN3A6SIYALZUCEZVAVCNFSM6AAAAABNIEIQSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRRGU4TKNRUGQ . You are receiving this because you authored the thread.Message ID: @.***>

-- All that is necessary for evil to succeed is for good people to do nothing.

alex avatar Aug 30 '24 15:08 alex

It really should be possible to build a wheel from an sdist while exerting precise control over all dependencies to be built.

I think it is (though I've not tried it with this sdist example), but it requires constructing a constraints.txt which includes a hash of your sdist and point the env variable PIP_CONSTRAINT to that file.

notatallshaw avatar Aug 30 '24 15:08 notatallshaw

You probably also need to set the env var for require hashes, I think PIP_REQUIRE_HASHES=1?

notatallshaw avatar Aug 30 '24 15:08 notatallshaw

-c doesn't work and I need to use PIP_CONSTRAINT, but that seems orthogonal to this. (It's also a fairly significant foot-gun, but that's also orthogonal!)

My understanding is constraints pre-date isolated builds, and further pip has no user facing way to find out the required build dependencies of a package, therefore workflows which involve using pip freeze to generate pinned constraints might break if -c was passed to the isolated build environment depending on containts generated and the options the user is using.

Pip-tools has a way of extracting build dependencies: https://github.com/jazzband/pip-tools?tab=readme-ov-file#maximizing-reproducibility, but relies on the same PIP_CONSTRAINT env variable when you sync your environment.

uv improves the situation by separating out regular constraints and build constraints: https://docs.astral.sh/uv/pip/compatibility/#build-constraints, but it's not clear to me from the docs if the CLI option is applied recursively or you need to use UV_BUILD_CONSTRAINT to ensure a build dependencies build dependencies are pinned (but I don't think uv provides a wheel option).

I've heard Bazel supports reproducible fully pinned Python projects, but I don't understand the tool well enough that looking at their documentation tells me if this is true or not.

notatallshaw avatar Aug 30 '24 16:08 notatallshaw

To take a step back here: My overall goal is to take a local sdist, build a wheel from it, and do so with any downloaded artifacts pinned to a version and hash verified.

The last element is presently an impediment because pip attempts to verify the hash of the sdist itself, which is not in the constraints file.

I have a lack of clarity about whether this is desired behavior by the pip maintainers, so I want to lay out three possible directions here:

  1. Verifying the hash of the local sdist is not intended or desired behavior: pip should stop checking the hash of a local sdist.
  2. Verifying the hash of a local sdist is either a) desired or b) not desired, but now part of the backwards compatibility surface for pip: pip should add a flag to disable performing this verification
  3. Verifying the hash of a local sdist is both intended, desired, and there is no interest in allowing it to be disabled: This issue should be wontfixed

Perhaps there's other options too, but I'd be interested in which direction the maintainers prefer.

alex avatar Aug 31 '24 16:08 alex

Perhaps there's other options too, but I'd be interested in which direction the maintainers prefer.

I can't speak for the other maintainers, but my personal view is somewhere between (2) and (3). I think that --require-hashes should mean what it says, and require hashes for everything. We document that --require-hashes "is implied when any package in a requirements file has a --hash option", and while constraints files aren't mentioned explicitly, we don't document much about constraint files in general, and I'd expect people to assume they work similarly to requirement files. So we'd be potentially breaking compatibility to change this, even if we wanted to.

In addition, as I said above, I think that pinning down precisely what the semantics of any potential "disable hash verification for local sdists" are would be both difficult to do, and difficult to document. So even if the consensus was (2), I'm against having an option unless someone can prove me wrong by specifiying the behaviour clearly and unambiguously.

Having said all of this, I have little or no experience of acually using hash checking mode, so I'd defer to someone with real world experience if they said otherwise.

pfmoore avatar Aug 31 '24 21:08 pfmoore

My experience working at a place where hash checking is strongly recommended by security but we also have several local requirements is I eventually wound up adding a flag to pip compile to work around this issue. What I do now is I have,

requirements.in file which has a list of dependencies to install of local packages. I use pip compile (now uv pip compile) to convert requirements.in to requirements.txt and I include flag --exclude-package/--unsafe-package (name varies by uv vs pip tools) to exclude local packages from .txt file. Then I do pip install --no-dependencies requirements.txt and pip install --no-dependencies requirements.in (second one installs local packages).

A little convoluted, but I think current hash checking mode mostly annoying with local/editable dependencies mixed in and forces some tricks like this. The pip tools issue about --unsafe-package also had other people comment using this kind of trick to work around --require-hashes behavior.

So my own preference is 1 would make usage of editable/local easier, but today I've found a workable alternative with multiple install commands/files that deals with this issue.

Before I found this solution security recommendation boiled down to we lack a good way to handle this case and only see awkward choices.

edit: Glancing at how other team's in my company deal with this kind of issue, it either is multiple install commands/requirement files or not use hashes. Although for latter I'm unsure if it's avoid for this issue or they are unaware of using hashes/workaround paths.

edit 2: Also one suggested possible solution is flag like —no-hashes package-name that can be specified multiple times and explicitly specify which packages to not check hashes for. That’s roughly how exclude package way. No special logic for local/editable but allow user/script running install to explicitly mark some as fine without hash.

hmc-cs-mdrissi avatar Aug 31 '24 23:08 hmc-cs-mdrissi

Okay, I tried to create a workflow for OP without any workarounds or using other tools, and it's not clear to me it's even possible to pin build requirements in an isolated build environment with hashes? In short:

  1. Pip can only enforce pinned build dependencies with PIP_CONSTRAINT
  2. Pip will not take the hashes from a constraint file
  3. pyproject.toml build requirements can not include hashes

Example of trying to install an unhashed requirement with a hashed constraint:

  1. Create constraints.txt with the contents:
setuptools==74.1.1 --hash=sha256:fc91b5f89e392ef5b77fe143b17e32f65d3024744fba66dc3afe07201684d766
  1. Run pip install setuptools==74.1.1 -c constraints.txt, and get error:
ERROR: Hashes are required in --require-hashes mode, but they are missing from some requirements. Here is a list of those requirements along with the hashes their downloaded archives actually had. Add lines like these to your requirements files to prevent tampering. (If you did not enable --require-hashes manually, note that it turns on automatically when any package has a hash.)
    setuptools==74.1.1 --hash=sha256:fc91b5f89e392ef5b77fe143b17e32f65d3024744fba66dc3afe07201684d766

Full example of minimal workflow

I'm using bash for this example, you'll need to adapt to whatever shell you use:

  1. mkdir minimal_project
  2. cd minimal_project
  3. Create pyproject.toml with following contents:
[build-system]
requires = ["setuptools==74.1.1", "wheel==0.44.0"]
build-backend = "setuptools.build_meta"

[project]
name = "minimal_project"
version = "0.1.0"
  1. mkdir -p src/minimal_project
  2. touch src/minimal_project/__init__.py
  3. python -m build --sdist
  4. Create build-constraints.txt with the contents:
setuptools==74.1.1 --hash=sha256:fc91b5f89e392ef5b77fe143b17e32f65d3024744fba66dc3afe07201684d766
wheel==0.44.0 --hash=sha256:2376a90c98cc337d18623527a97c31797bd02bad0033d41547043a1cbfbe448f
  1. Create requirements file list so: echo "file://$(realpath dist/minimal_project-0.1.0.tar.gz) --hash=sha256:$(sha256sum dist/minimal_project-0.1.0.tar.gz | cut -d' ' -f1)" > sdist-requirements.txt
  2. Export build constraints: export PIP_CONSTRAINT="$PWD/build-constraints.txt"
  3. Attempt to build wheel: python -m pip wheel --no-deps -r sdist-requirements.txt and get error:
Processing ./dist/minimal_project-0.1.0.tar.gz
  Installing build dependencies ... error
  error: subprocess-exited-with-error

  × pip subprocess to install build dependencies did not run successfully.
  │ exit code: 1
  ╰─> [4 lines of output]
      Collecting setuptools==74.1.1
        Using cached setuptools-74.1.1-py3-none-any.whl (1.3 MB)
      ERROR: Hashes are required in --require-hashes mode, but they are missing from some requirements. Here is a list of those requirements along with the hashes their downloaded archives actually had. Add lines like these to your requirements files to prevent tampering. (If you did not enable --require-hashes manually, note that it turns on automatically when any package has a hash.)
          setuptools==74.1.1 --hash=sha256:fc91b5f89e392ef5b77fe143b17e32f65d3024744fba66dc3afe07201684d766
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

Am I missing something? Is it possible to use hashes with pip for build requirements at all? I think if hashes from constraints were accepted, then this workflow would work.

notatallshaw avatar Sep 05 '24 03:09 notatallshaw

On a side note for OP, uv just added uv build and can build from sdists: https://github.com/astral-sh/uv/pull/6898, which combined with UV_BUILD_CONSTRAINT you should be able to get what you want, but I haven't tried it.

notatallshaw avatar Sep 05 '24 03:09 notatallshaw

For build requirements you can have separate build_requirements.txt file with hashes install that file with no-dependencies and then afterwards for main install do no-build-isolation as a workaround.

How do you even determine build requirements is another problem as pip compile/similar tool only produce resolution of install requirements not build ones although in practice my experience is build requirement list is usually very short so I’ve just manually made it.

hmc-cs-mdrissi avatar Sep 05 '24 04:09 hmc-cs-mdrissi

For build requirements you can have separate build_requirements.txt file with hashes install that file with no-dependencies and then afterwards for main install do no-build-isolation as a workaround.

Yes, I gave an example of that workflow earlier (https://github.com/pypa/pip/issues/12942#issuecomment-2321467988), but OP was unhappy with --no-build-isolation so I was seeing if it was possible to come up with some workflow that could pin build requirements with hashes using build isolation, and I guess the answer is no.

notatallshaw avatar Sep 05 '24 04:09 notatallshaw

This is basically a standards issue at the core. Hashes are a form of locking, and currently we have no package locking standard. In particular:

  1. The specification for requirements does not include a standardised way to include a file hash.
  2. The specification for pyproject.toml requires you to define the build dependencies as requirements.

The proposed lockfile standard, PEP 751, includes a section for specifying locked build requirements. That may help with this workflow, once the PEP gets approved and implemented.

pfmoore avatar Sep 05 '24 08:09 pfmoore

This is basically a standards issue at the core. Hashes are a form of locking, and currently we have no package locking standard

It's only a standards issue in the sense that pip's existing installer features don't currently work in this scenario, but if pip would allow a constraints file to constrain requirements via hashes this would solve this workflow without the need for a new standard.

The specification for requirements does not include a standardised way to include a file hash.

No, but pip documents how to: https://pip.pypa.io/en/stable/topics/secure-installs/ to pin a requirement via hashes, and it allows a user to specify them in a constraints file, but then it doesn't functionally let the user constrain to those hashes.

The proposed lockfile standard, PEP 751, includes a section for specifying locked build requirements. That may help with this workflow, once the PEP gets approved and implemented.

For a lock file this seems a little under specified. Specifically I would expect an actual standard around locking to let you lock build requirements per requirement, I guess I'll have to chime in on that very very long discuss thread 🙁

notatallshaw avatar Sep 05 '24 12:09 notatallshaw

It's only a standards issue in the sense that pip's existing installer features don't currently work in this scenario

I guess that's true, yes. Given that pip is tending to add features based on standards these days, rather than innovating functionality, I think I'd rather see a standards-based solution for this, though.

No, but pip documents how to: https://pip.pypa.io/en/stable/topics/secure-installs/ to pin a requirement via hashes, and it allows a user to specify them in a constraints file, but then it doesn't functionally let the user constrain to those hashes.

As you noted, though, constraints files don't allow hashes. Constraint files have historically been very under-documented and prior to the new resolver implementation, had some odd behaviours. They were streamlined and clarified when we implemented them for the new resolver, to act in the "package finding" phase to limit what files the finder could see. With that design, I'm not sure that including hashes in a constraint file makes sense (hash checking happens much later in the install process, if I recall the details correctly).

Also, the details of what configuration is shared between the main pip process and the (recursive[^1]) build environment construction is fairly underspecified, having been based on some quite simplified assumptions and then extended as needed.

With all of that in mind, re-working the build environment creation process to correctly pass through and respect hashes is likely to be a complicated design and implementation task, and with pip's limited maintainer base, I'm not sure it's the most important thing for us to tackle. All of which is why I'd prefer it if we had a standards-based solution, so the design is done for us, up front.

For a lock file this seems a little under specified. Specifically I would expect an actual standard around locking to let you lock build requirements per requirement, I guess I'll have to chime in on that very very long discuss thread 🙁

I view that section of the PEP as indicative that the intention is to cover this area, but it's not something that has had extensive discussion, so yes, it may be under specified. I'd strongly advise you to point out any issues if you think that's the case, as there's a risk otherwise that it'll get missed with all of people's energy having been used up by questions like portability of lockfiles.

[^1]: It's all very well supplying hashes for all of your build dependencies, but what about your build backend's build dependencies? Will we hit a "hash has not been supplied" error on the next level down?

pfmoore avatar Sep 05 '24 13:09 pfmoore