uv
uv copied to clipboard
Support `pip download`
This would be especially useful for buildng docker images.
You could then rely on uv
for a quick resolve and use a simple pip install --no-deps --find-links
in your Dockerfile
.
This is along the similar lines of supporting pip wheel discussed in https://github.com/astral-sh/uv/issues/1681
Hi everyone, is this feature on the roadmap? I am guessing supporting "pip download" would be straightforward since uv already downloads packages. It would just have to not install them.
Any help needed?
We should be able to support it... Though it's not trivial because we don't store the .whl
files at all, we unzip them directly into the cache. So most of the data pipelines are oriented around an API that receives the unzipped wheel, rather than the zipped wheel.
What are the typical use-cases here?
What are the typical use-cases here?
In my case it's a docker build in a github workflow.
Caching docker layers on github runners is impossible AFAIK. Caching ~/.cache
is trivial.
So a build could download whls into the docker working dir, then do
COPY whls whls
RUN pip install --no-deps --find-links whls ....
Which wouldn't hit pypi and wouldn't need any additional caching from docker.
... This can be even better if you can do RUN --mount ...
I'm mostly wondering if it has to be wheels or if we could just make it easy to pre-populate the uv cache.
I'm mostly wondering if it has to be wheels or if we could just make it easy to pre-populate the uv cache.
wheels are supported by standard pip install
.
Otherwise you need to use uv
inside docker build. Not bad necessarily, but a bit less flexible.
I'm not sure how much we should go out of our way to support using pip to consume an output of uv? It seems weird to use uv in one case and pip in another, right?
If it were equally easy for us I'd probably prefer to output wheels, it's a nicer intermediary format that's less coupled to our internal cache. I'd need to see how hard it is to support.
What are the typical use-cases here?
Hi, my use case is that I have to supply bundles of my application with all dependencies for systems where it is not possible to download them (firewall blocking). Right now I use pip download which results in a bunch of wheel files.
Also we would like to be able to cross platform download them too
That makes sense, thanks.
I'm not sure how much we should go out of our way to support using pip to consume an output of uv? It seems weird to use uv in one case and pip in another, right?
That part of the workflow may not be entirely under your control. @inoa-jboliveira's example is a better one I think, because it's essentialy about packaging your application. Packaging may need to comply with a specific post-install procedure.
Right now I use pip download which results in a bunch of wheel files.
Note, both pip download
and pip wheel
are similar but there's crucial differences. If you're looking to package up your wheels, pip wheel
is often recommended instead since it covers for cases where a download does not have a pre-built wheel, being a more complete solution to pre-packaging wheels for a target system.
As a result, I tend to just always use pip wheel
nowadays to make sure I always have wheels rather than potential source distributions that I'll have to build on the target system. From my perspective, pip download
is more useful when you want to package up sdists or when you don't care if everything you download is fully pre-built.
Some of these tradeoffs were actually discussed in #1681.
Hi @samypr100 in this specific case I really just want to download pre built wheels and not build anything. I can pip download --platform foo
and I am good to go. That's why I need and still use pip download.
As for pip wheel, I can't cross platform download (nor compile) anything
pip download -d vendor/ --index-url internal_pypi internal-sdk
can be used with uv pip install -f vendor
if we need to vendor anything.
Also quite interested in this as pip download
is a slower part in our CI environment.
In our case our security scanning tool requires to run on a folder of wheel / source distributions, we currently use pip download to gather these.
Another use case would be downloading build time dependencies (in addition to runtime dependencies). I'm not sure if this is feasible, since it is not supported by pip (https://github.com/pypa/pip/issues/7863). However, this would be extremely useful when building a Flatpak which involves Python packages, which is currently broken because of that (https://github.com/flatpak/flatpak-builder-tools/issues/380).
Although pip download
is very handy for multi-stage Docker builds to efficiently cache dependencies,
uv
doesn’t store the .whl files; instead, it unzips them directly into the cache.
It would be even better if the uv
cache could be used to install requirements directly into the target stage instead of installing from the wheels. I'm curious whether there's a straightforward method to populate the uv
cache for a list of packages.
We have a business use case to scan dependencies of a Python project, we need to pip download requirements, it's slow without uv 😒
We have a business use case to scan dependencies of a Python project, we need to pip download requirements, it's slow without uv 😒
Not saying you shouldn't use uv, but do you have an example where pip 24.2 is slow at downloading?
Especially if you've already pre-resolved the requirements with uv pip compile, as hopefully the biggest bottleneck is IO. I should be able to profile and see if there's any low hanging fruit in pip that can be fixed.
Also should be a good scenario to see if uv can advertise being faster here or not.
pip download is pretty ok/fast enough for my needs. I do also open multiple processes and am constrained only by network so UV won't be any faster without cache (I'm not the guy above).
It is necessary for 2 reasons:
- fully replace pip and not need it as a dependency
- it is likely 99% done already, just missing the interface or some "do not unzip" flag to the actual data that is cached. Maybe a re-zip cached wheels for not redownloading them.
By using cache it would indeed be faster than pip for same platform downloads. Although, for my use case, I need to download cross platform, so the packages won't be there (e.g. numpy or pandas which are large plataform specific packages).
What are the typical use-cases here?
At the risk of repeating what other people have said, to chime in with my use case (also Docker image building for deployment, reproducibility is a secondary concern for me) and perhaps shed light on why pip download is important enough to support:
- In a Docker build context, there used to be (i.e. outdated practice) a flag to pip install
—global-option=build_ext
which would trigger package build from source - Now if you want to do that
build_ext
has been deprecated* for an explicit build step, so to build from source you would first run pip download to download that source** followed by building that and uv build takes these downloaded wheel- *because it kind of was considered out of scope for install commands to be also building, and presumably with the intro of the build backend toml format
- **or get it from a repo’s release files, but via pip was the standard way
- So the pseudocode workflow used to be “
pip install —build-and-install my-package
and now it’spip download my-package
,python setup.py build my-package my-wheel
,pip install-my-wheel
”. - With
uv
we’d perhaps be able to something more integrated (uv download+build+install my-package
), maybe with helpful optimisations like caching
Also I’d note that as a user of these toolsets it can be confusing to keep up with the proliferation of ways to do the one thing as the state of the art evolves
I found an issue RE: pip wheel
when looking for this thread, and it notes that “pip would prefer to deprecate pip wheel” (#1681)
Hello everyone, I wanted to contribute some additional use cases for consideration. While most discussions here focus on the cloud, my perspective comes from the embedded world. Let’s consider the 10 billion devices currently operating on cellular networks, of which 1.8 billion are IoT/M2M devices. Many of these devices do not have access to "unlimited good bandwidth" but still require software updates, CI, etc.
When using Python, one way to accelerate these deployments is by pre-fetch PyPI packages ahead of time. This is not about supporting pip download
but rather working with cached downloads. For example, you could use a method like this in a pre-fetch for big packages like TensorFlow (~400MB):
file_url="https://files.pythonhosted.org/packages/5e/31/d49a3dff9c4ca6e6c09c2c5fea95f58cf59cc3cd4f0d557069c7dccd6f57/tensorflow-2.7.4-cp39-cp39-manylinux2010_x86_64.whl"
wget --continue --quiet -P . "$file_url"
And then the actual software deployment could use this pre-fetched file and get the rest of the dependencies from the PyPI directly.
By enabling the UV package manager to use these pre-fetched/cached files, deployments become more efficient. Also, it's intuitive for a user to think about UV operations as downloading, storing, and installing.
By enabling the UV package manager to use these pre-fetched/cached files, deployments become more efficient. Also, it's intuitive for a user to think about UV operations as downloading, storing, and installing.
I don't think this is the same issue? And should already be possible.
If you have acquired the wheels you can install directly from them (or use --find-links
):
$ pip download requests --no-deps
Collecting requests
Using cached requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Downloading requests-2.32.3-py3-none-any.whl (64 kB)
Saved ./requests-2.32.3-py3-none-any.whl
$ uv pip install ./requests-2.32.3-py3-none-any.whl
Resolved 5 packages in 176ms
Prepared 5 packages in 120ms
Installed 5 packages in 37ms
+ certifi==2024.8.30
+ charset-normalizer==3.3.2
+ idna==3.8
+ requests==2.32.3 (from file:///home/dshaw/uvtest/3163/requests-2.32.3-py3-none-any.whl)
+ urllib3==2.2.3
Also, pretty sure you can install from uv on your base machine, copy the cache to other devices, and then point uv cache to the copied directory, this should use even less resources (CPU and storage) on your IoT devices, as there's no extra step of unzipping the contents and storing it somewhere.
I don't think this is the same issue? And should already be possible.
Unfortunately it's not possible. I have better explanation here if you are interested #7296.
$ uv pip install ./requests-2.32.3-py3-none-any.whl
Installing a single package isn't the goal here but managing all the dependencies with uv. Basically doing something like this uv sync --find-links [path-to-some-pre-fetched-wheels]
copy the cache to other devices, and then point uv cache to the copied directory,
In this use case, the biggest problem is data usage (for some devices, you pay per MB of usage), and the UV's cache contains the unzipped versions of wheels. For example, TensorFlow, which is ~400MB, expands into GB of data
In this use case, the biggest problem is data usage (for some devices, you pay per MB of usage), and the UV's cache contains the unzipped versions of wheels. For example, TensorFlow, which is ~400MB, expands into GB of data
To copy onto the device before it does any downloading or the total amount of storage on the device?
Because if it's the total amount of storage on the device then you will use less space by copying the cache, because copying the wheel will take up the wheel + the install, whereas copying the cache will just be the install, and the the site-packages location will just hard link to the cache and use no additional space.
If it's to initially copy onto could zip the uv cache up and then have a small script that unzips it into the actual uv cache folder and then delete the zip.
I'm not saying it wouldn't be helpful for uv to have a download function and what you propose in https://github.com/astral-sh/uv/issues/7296, just spitballing solutions with existings tools.