pip icon indicating copy to clipboard operation
pip copied to clipboard

pip 24.1.1 tries to rebuild already installed sdist referenced by direct URL

Open ncoghlan opened this issue 1 year ago • 2 comments

Description

After setting up an environment, I later need to use pip install -r requirements.txt to check if any packages previously available from other sys.path entries now need to be installed directly into the environment (while this isn't the exact scenario, it's similar to what happens when system packages get uninstalled after an environment was originally set up with --system-site-packages).

This second install command passes --no-index, --no-deps, and --only-binary ':all:' since there is absolutely zero reason for this second pass to need any source builds or artifact downloads, everything would have been cached during the original install (either of the base environment or one of the layered environments built on top of it).

At this point, a few things go wrong:

  • the already installed copy of the directly referenced distribution gets ignored, pip tries to reinstall it even though --upgrade has not been specified for this second pass (it's only aimed at filling in now-missing dependencies, not upgrading or downgrading anything)
  • the previously cached build for the directly referenced sdist gets ignored (I originally thought this was because there are hashes present in the requirements file and no way to tell pip that it's OK to use the cached wheels even though their hashes aren't explicitly listed, but the simplified reproducer below shows that isn't the case)
  • the --only-binary setting gets ignored for the direct URL sdist reference (pip tries to build it instead of just skipping it)

Expected behavior

I expected to see Requirement already satisfied reported for the already installed directly referenced sdist instead of an attempted isolated build that fails because index access is disabled.

pip version

24.1.1

Python version

3.12

OS

Fedora 40 (WSL remix)

How to Reproduce

In a fresh virtual environment:

  1. Run bin/python -m pip install contextlib2@https://github.com/jazzband/contextlib2/archive/refs/tags/21.6.0.zip
  2. Run bin/python -m pip install --no-index --no-deps --only-binary ':all:' contextlib2@https://github.com/jazzband/contextlib2/archive/refs/tags/21.6.0.zip

(This isn't the project where I ran into the problem, but it illustrates the issue without bringing in a massive tree of ML/AI dependencies)

Adding a hash fragment to the direct URL reference (#sha256=18558d0007e33caf2c28070cc70ed2e8445e5af60264ea6b88d79b760953c3bc for the given example) doesn't change the behaviour (the already cached wheel build still gets ignored)

Output

Install a directly referenced sdist:

acoghlan@FROZENVAPOUR:~/test_venv$ bin/python -m pip install contextlib2@https://github.com/jazzband/contextlib2/archive/refs/tags/21.6.0.zip
Collecting contextlib2@ https://github.com/jazzband/contextlib2/archive/refs/tags/21.6.0.zip
  Downloading https://github.com/jazzband/contextlib2/archive/refs/tags/21.6.0.zip
     - 52.5 kB 1.2 MB/s 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: contextlib2
  Building wheel for contextlib2 (pyproject.toml) ... done
  Created wheel for contextlib2: filename=contextlib2-21.6.0-py2.py3-none-any.whl size=13219 sha256=88b657cca62a71403468e1285614b65a02ea6c7487ad89c36f69d34c4c32c63a
  Stored in directory: /tmp/pip-ephem-wheel-cache-um1pg8wu/wheels/93/98/ef/a94ffd2b8dc653380f6ea18153fe981054461fa1eca8fff83b
Successfully built contextlib2
Installing collected packages: contextlib2
Successfully installed contextlib2-21.6.0

Try to reinstall that sdist without allowing downloads or binary builds:

acoghlan@FROZENVAPOUR:~/test_venv$ bin/python -m pip install --no-index --no-deps --only-binary ':all:' contextlib2@https://github.com/jazzband/contextlib2/archive/refs/tags/21.6.0.zip
Collecting contextlib2@ https://github.com/jazzband/contextlib2/archive/refs/tags/21.6.0.zip
  Using cached https://github.com/jazzband/contextlib2/archive/refs/tags/21.6.0.zip
  Installing build dependencies ... error
  error: subprocess-exited-with-error

  × pip subprocess to install build dependencies did not run successfully.
  │ exit code: 1
  ╰─> [2 lines of output]
      ERROR: Could not find a version that satisfies the requirement setuptools>=40.8.0 (from versions: none)
      ERROR: No matching distribution found for setuptools>=40.8.0
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

In this case, it's absolutely a problem with pip: why is it trying to rebuild an already installed package that it wasn't asked to upgrade, that it doesn't need dependency information from, and was definitively told not to try to build from source?

Dropping the restrictions on the second install shows that it's the unconditional metadata retrieval that is causing this failure:

$ bin/python -m pip install contextlib2@https://github.com/jazzband/contextlib2/archive/refs/tags/21.6.0.zip
Collecting contextlib2@ https://github.com/jazzband/contextlib2/archive/refs/tags/21.6.0.zip
  Using cached https://github.com/jazzband/contextlib2/archive/refs/tags/21.6.0.zip
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done

Code of Conduct

ncoghlan avatar Jun 28 '24 05:06 ncoghlan

The problem here is that we have no way of knowing that the source at a given URL hasn't changed. There are people whose workflows rely on pip install <url> doing a rebuild - the obvious example is pip install . - so we take the conservative view that we need to rebuild all URL requirements.

This is noted in the documentation:

Wheels built from source distributions provided to pip as a direct path (such as pip install .) are not cached across runs, though they may be reused within the same pip execution.

In this particular case (--no-deps --only-binary ':all:'), rebuilding is clearly not wanted, but I'd argue that the problem here is that a URL to a source package combined with --only-binary ':all:' is the difficulty - we should reject this with an error "Cannot specify --only-binary with a source URL". We certainly shouldn't satisfy a source URL requirement with a cached binary that we can't guarantee is built from the same content that the URL now points to.

pfmoore avatar Jun 28 '24 09:06 pfmoore

FWIW, it appears uv does not attempt to reinstall, using:

$ uv pip install contextlib2@https://github.com/jazzband/contextlib2/archive/refs/tags/21.6.0.zip
$ uv pip install --no-index --no-deps --only-binary ':all:' contextlib2@https://github.com/jazzband/contextlib2/archive/refs/tags/21.6.0.zip

It could be though because no one using uv has been bitten by this being an issue. My observed approach is they agressively cached scenarios and as users have reported issues to them with this they have backed off and instead reinstalled packages in the face of ambiguity.

notatallshaw avatar Jun 28 '24 13:06 notatallshaw