pip icon indicating copy to clipboard operation
pip copied to clipboard

Pip install `--dry-run` shouldn't download full wheels when metadata file available

Open notatallshaw opened this issue 1 year ago • 16 comments

Description

When running pip install --dry-run {package} pip downloads the metadata file and then the full wheel

Expected behavior

Dry run installs don't need to download the full wheels

pip version

24.0

Python version

3.11

OS

Linux

How to Reproduce

pip install --dry-run kaleido==0.2.1

Output

$ pip install --dry-run kaleido==0.2.1
Collecting kaleido==0.2.1
  Downloading kaleido-0.2.1-py2.py3-none-manylinux1_x86_64.whl.metadata (15 kB)
Downloading kaleido-0.2.1-py2.py3-none-manylinux1_x86_64.whl (79.9 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 79.9/79.9 MB 34.6 MB/s eta 0:00:00
Would install kaleido-0.2.1

Code of Conduct

notatallshaw avatar Mar 28 '24 18:03 notatallshaw

PRs welcome. However, one potential upside of downloading the wheels is that if the dry run is followed by an actual install, the install itself will be faster as everything needed will be cached. So it's not immediately obvious that this change would be beneficial overall.

pfmoore avatar Mar 28 '24 18:03 pfmoore

PRs welcome. However, one potential upside of downloading the wheels is that if the dry run is followed by an actual install, the install itself will be faster as everything needed will be cached. So it's not immediately obvious that this change would be beneficial overall.

I think there are two main use cases for dry run, one where you're validating what would install and then installing, and one where you're collecting the output of dry run as part of a larger environment management process, in this case you may not be installing locally, you could be preparing files for a docker build, or many other things.

I think you're thinking of the first case. Even in that case you may decide that the versions are wrong and you want to update the requirements, you may need to do this several times. If the wheels are very large, it's a lot of wasted download and time as you iterate finding the right requirements.

notatallshaw avatar Mar 28 '24 18:03 notatallshaw

I've inserted the --dry-run option, but it seems to be ignored by pip. Is that unexpected behavior related to this issue?

rootsmusic avatar Apr 11 '24 15:04 rootsmusic

I've inserted the --dry-run option, but it seems to be ignored by pip. Is that unexpected behavior related to this issue?

Dry run works fine for me, in the sense that it doesn't install any packages, if you have an issue that appears to be a bug create a new issue with steps to reproduce, including your environment such as your pip version etc.

notatallshaw avatar Apr 11 '24 15:04 notatallshaw

Dry run works fine for me

@notatallshaw, you're right. I mistakenly assumed that it wasn't, because the output says "Downloading" (which I quickly interrupted). At the end, the output does say "would install" the downloaded package.

rootsmusic avatar Apr 11 '24 15:04 rootsmusic

I started working on a PR and I think it's going to be a smaller change once https://github.com/pypa/pip/pull/12300 lands because there's some specific legacy version and specifier warnings which assume the full wheel file is there. So I'm going to wait until that is no longer a concern rather than trying to fix that code.

Although this is going to be bigger than a 10 line change, because InstallRequirement needs to be changed so it can use the metadata file if it is there, instead of the wheel file. This seems like a positive change though, should be less IO.

notatallshaw avatar Apr 13 '24 20:04 notatallshaw

x-ref https://github.com/pypa/pip/pull/12186

pradyunsg avatar Apr 14 '24 00:04 pradyunsg

That PR seems effectively dead, hopefully I can touch this code in a much simpler manner.

notatallshaw avatar Apr 14 '24 02:04 notatallshaw

where you're collecting the output of dry run as part of a larger environment management process

We have this use case as well. We use --dry-run with --report to figure out dependencies. The wheels downloads slow down the process significantly.

sfc-gh-nsharma avatar May 31 '24 01:05 sfc-gh-nsharma

@notatallshaw - just checking how are you getting along with the PR? I was surprised when I discovered that dry-run does the full download, and personally will be very happy when dry-run is simply using repository metadata to figure out what it would install.

pelson avatar Jun 22 '24 03:06 pelson

Not had time to work on it, might revisit it in a few weeks, and I was only going to submit it if I could make a relatively simple change. If anyone else wants to give it a go, by all means don't wait for me.

notatallshaw avatar Jun 22 '24 04:06 notatallshaw

fwiw, for the time being, I've been happily running https://github.com/pypa/pip/pull/12186 on my machine:

pip install 'pip @ https://github.com/cosmicexplorer/pip/archive/refs/heads/metadata_only_resolve_no_whl.zip'

ddelange avatar Jul 02 '24 11:07 ddelange

I was just about to file the same issue :)

P.S. @notatallshaw I can't edit the issue title. Could you fix the typo in the word “shouldn't”?

webknjaz avatar Aug 05 '24 14:08 webknjaz

Done.

Also there was some recent progress in https://github.com/pypa/pip/pull/12863

Personally, I'd quite like the work done by @cosmicexplorer to land.

notatallshaw avatar Aug 05 '24 14:08 notatallshaw

Just sharing a concrete example that brought me here. I have a container with PyTorch 2.2 and CUDA 12.3 I used --dry-run to see what adding xformers would modify. The pip install --dry-run xformers downloaded Torch 2.3 and all the Nvidia-related content, which is over 1GB.

mmartial avatar Aug 05 '24 21:08 mmartial

@notatallshaw please keep pinging me if you perceive any blocking on my part! I have extreme confidence in my approach for all of my open PRs, which has been refined and honed over years starting from #7819 ever since I realized pip was the right place for the optimization work I was doing for Twitter Cortex ML in pantsbuild/pants#8793 (which produced install --report and --use-feature=fast-deps). I have the time and energy to make sure these all land, and I have the utmost respect for the pip maintainers, who are some of my favorite people to interact with and learn from. I know we're all busy and doing several things at once, and I left these PRs hanging for a few months earlier this year, which definitely slowed things down (sorry!).

If you apply all my current diffs (https://github.com/pypa/pip/pulls/cosmicexplorer) in series (the last one is #12258, sorry I need to rebase this, I'll do that now), you will get a truly fantastic performance improvement with minimal complexity, making use of the metadata resolution framework introduced in prior PRs to read metadata from cache (and even further). In addition to performance, it will also drastically reduce the number and magnitude of HTTP requests made against pypi: see #12256. Each of these subsequent PRs demonstrates a robust performance improvement, especially when resolving large binary wheels for ML frameworks like @mmartial discussed. The overall performance improvement is quite drastic (especially with --dry-run for large wheels), and after I rebase them all I can show clear benchmarks. Please let me know if I can improve my presentation/proposal of any of these changes to make them more convincing.

On a personal level, I really appreciate you advocating for my work, and I would love if you could continue to help nudge me to make sure this gets done. I'm @[email protected] on mastodon and if you DM me there or on twitter I will be more likely to respond. I think this code is good, I think it's right for pip, and I will do my part to keep iterating on these PRs until they're pip quality.

cosmicexplorer avatar Aug 13 '24 09:08 cosmicexplorer

PRs welcome. However, one potential upside of downloading the wheels is that if the dry run is followed by an actual install, the install itself will be faster as everything needed will be cached. So it's not immediately obvious that this change would be beneficial overall.

I think it's an upside for regular python apps, but with pytorch, xformers, accelerate etc it looks like this:

(C:\Users\admin\Desktop\one-click-installers-tts-main\tts-generation-webui\installer_files\env) C:\Users\admin\Desktop\one-click-installers-tts-main\tts-generation-webui>pip install --dry-run torch==2.3.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
Looking in indexes: https://download.pytorch.org/whl/cu118
Collecting torch==2.3.0
  Downloading https://download.pytorch.org/whl/cu118/torch-2.3.0%2Bcu118-cp310-cp310-win_amd64.whl (2673.0 MB)
     ━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.1/2.7 GB 40.9 MB/s eta 0:01:02
ERROR: Operation cancelled by user

(C:\Users\admin\Desktop\one-click-installers-tts-main\tts-generation-webui\installer_files\env) C:\Users\admin\Desktop\one-click-installers-tts-main\tts-generation-webui>pip install --dry-run torch==2.5.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
Looking in indexes: https://download.pytorch.org/whl/cu118
Collecting torch==2.5.0
  Downloading https://download.pytorch.org/whl/cu118/torch-2.5.0%2Bcu118-cp310-cp310-win_amd64.whl (2700.2 MB)
     ━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.1/2.7 GB 40.9 MB/s eta 0:01:04
ERROR: Operation cancelled by user

rsxdalv avatar Oct 24 '24 12:10 rsxdalv

you can use one of @cosmicexplorer's branches:

pip install 'pip @ https://github.com/cosmicexplorer/pip/archive/refs/heads/metadata_only_resolve_no_whl.zip'

they are being tracked here: https://github.com/pypa/pip/issues/12921

ddelange avatar Oct 24 '24 12:10 ddelange