poetry
poetry copied to clipboard
Poetry downloading same wheels multiple times within a single invocation
- [x] I am on the latest Poetry version.
- [x] I have searched the issues of this repo and believe that this is not a duplicate.
- [ ] If an exception occurs when executing a command, I executed it again in debug mode (
-vvv
option).
- OS version and name: macOS 10.14.6
- Poetry version: 1.0.5
- Link of a Gist with the contents of your pyproject.toml file: https://gist.github.com/bb/501f33ad3f35eb8c26ce2513ca6074c8
Issue
When adding a new dependency, it is downloaded multiple times; I observed three downloads, two of those are unneccessary.
Starting with a pyproject.toml
as in the Gist given above, I run
poetry add https://github.com/oroszgy/spacy-hungarian-models/releases/download/hu_core_ud_lg-0.3.1/hu_core_ud_lg-0.3.1-py3-none-any.whl
Then I see the following output (XXX added as markers for explanation below):
Updating dependencies XXX
Resolving dependencies... (276.1s)
Writing lock file
XXX
Package operations: 0 installs, 7 updates, 0 removals
- Updating certifi (2019.11.28 -> 2020.4.5.1)
- Updating urllib3 (1.25.8 -> 1.25.9)
- Updating asgiref (3.2.3 -> 3.2.7)
- Updating pytz (2019.3 -> 2020.1)
- Updating django (3.0.4 -> 3.0.6)
- Updating hu-core-ud-lg (0.3.1 -> 0.3.1 https://github.com/oroszgy/spacy-hungarian-models/releases/download/hu_core_ud_lg-0.3.1/hu_core_ud_lg-0.3.1-py3-none-any.whl)
XXX - Updating psycopg2-binary (2.8.4 -> 2.8.5)
At the positions where the marker XXX
is inserted, the same 1.3GB download is done again and again.
Similar, when adding another package later, again XXX
marks the cursor position when the big download is done:
$ poetry add djangorestframework
Using version ^3.11.0 for djangorestframework
Updating dependencies
Resolving dependencies... (0.4s)
Writing lock file
XXX
Package operations: 1 install, 1 update, 0 removals
- Installing djangorestframework (3.11.0)
- Updating hu-core-ud-lg (0.3.1 -> 0.3.1 https://github.com/oroszgy/spacy-hungarian-models/releases/download/hu_core_ud_lg-0.3.1/hu_core_ud_lg-0.3.1-py3-none-any.whl)
XXX
I'd expect the file to be downloaded a most once and reused.
Slightly related but different issues: #999, #2094
The first two downloads happen in https://github.com/python-poetry/poetry/blob/41a8a470ec693b90f1353c37d60b398711bf2f29/src/poetry/puzzle/provider.py#L413-L418
which indeed use a temporary location that is immediately thrown away.
Presumably the right thing to share with would be the artifact cache as used by the Chef
?
What's the reasoning for dumping the downloads to a temp_dir
as @dimbleby shows in the snippet? Is it so the cache doesn't blow out to a massive size?
I'd be happy to try and contribute. Naively I'd check a cache wherever download_file is called (puzzle/provider.py
and repositories/http.py
) but there are likely some considerations I'm missing. If someone could advise I could put together a PR.
Suspect that code fragment uses a temporary directory for no particularly good reason.
poetry has a cache of downloaded files that it uses during installation, as managed by the curiously named Chef
class. I'd think that is the right thing to share with.
Couple of problems though:
- the chef uses such things as the current interpreter version to decide where to put these file, which an unwanted complication
- it's not entirely clear how to refactor to make the chef cache available during solving
I'd start with an MR that updates the chef so that get_cache_directory_for_link
only cares about the URL that the link is downloaded from - that should be straightforward, and will get maintainer opinion on whether this is a sensible track.
Then if that's accepted, follow up with some sort of rearrangement so that this cache can be shared by the chef and the solving code
Thanks @dimbleby I'll take a look and see what I can do.
This is a serious problem with packages like PyTorch which are extremely large. Unless there's a workaround for this I will definitely never use Poetry.
Any update on a fix for this? I really like poetry but locking or adding a new dependency now takes > 5 minutes because I have to download wheels for torch, torchaudio, and torchvision. Is there a short-term workaround while a more permanent fix is made? Thank you.
I suspect many are reading this issue without actually having experienced the issue -- Poetry downloads Torch once for metadata + hashing, and a second time for actual installation. After the cache is created, Poetry will not re-download Torch. We are downloading distfiles more often than needed as two parts of the code do not share a common cache, but we are not downloading every time poetry add
occurs or anything similar.
I suspect many are reading this issue without actually having experienced the issue -- Poetry downloads Torch once for metadata + hashing, and a second time for actual installation. After the cache is created, Poetry will not re-download Torch. We are downloading distfiles more often than needed as two parts of the code do not share a common cache, but we are not downloading every time
poetry add
occurs or anything similar.
Thanks for the reply. I am an active user of poetry running 1.2.1, I experience the issue as the pytorch wheel downloads every time I do add or lock and it takes around 80 seconds to download.
https://user-images.githubusercontent.com/47190785/195437531-0f111694-2fab-4f54-a8d8-73c46fcc9961.mp4
Every time I run poetry update
in my project, a large spaCy model gets download.
It is added to [tool.poetry.dependencies]
this way:
en_core_web_lg = { url = "https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz" }`
I think this is related: in a project I have these conditional URL dependencies defined
torch = [
{url = "https://download.pytorch.org/whl/cu113/torch-1.12.1%2Bcu113-cp39-cp39-linux_x86_64.whl", markers = "sys_platform == 'linux'"},
{url = "https://download.pytorch.org/whl/cpu/torch-1.12.1-cp39-none-macosx_10_9_x86_64.whl", markers = "sys_platform == 'darwin' and platform_machine == 'x86_64'"},
{url = "https://download.pytorch.org/whl/cpu/torch-1.12.1-cp39-none-macosx_11_0_arm64.whl", markers = "sys_platform == 'darwin' and platform_machine == 'arm64'"}
]
Every poetry lock
operation ends up redownloading the 3 wheels, which are quite large. Isn't there a way to have them cached by poetry?
I think this is related: in a project I have these conditional URL dependencies defined
torch = [ {url = "https://download.pytorch.org/whl/cu113/torch-1.12.1%2Bcu113-cp39-cp39-linux_x86_64.whl", markers = "sys_platform == 'linux'"}, {url = "https://download.pytorch.org/whl/cpu/torch-1.12.1-cp39-none-macosx_10_9_x86_64.whl", markers = "sys_platform == 'darwin' and platform_machine == 'x86_64'"}, {url = "https://download.pytorch.org/whl/cpu/torch-1.12.1-cp39-none-macosx_11_0_arm64.whl", markers = "sys_platform == 'darwin' and platform_machine == 'arm64'"} ]
Every
poetry lock
operation ends up redownloading the 3 wheels, which are quite large. Isn't there a way to have them cached by poetry?
On my system also, this seemed to make Poetry re-download torch
every time it resolved dependencies. It did not happen with other dependencies that were given by name (to be downloaded from PyPI, no URL).
Since PyTorch URLs have to be hard-coded to install properly and PyTorch's wheel takes more than 1GB, this prevents me from migrating the team to Poetry.
Ah, looking at this, I realize that all the metadata caching happens in the repository layer. So if you're using direct URL dependencies, Poetry has no caching whatsoever. I personally got turned around here on whether this was a bug or as-designed behavior (currently, the latter is true).
Ideally the artifacts cache could be made agonostic to repositories so that it is keyed on URLs only and we can share it, as @dimbleby has mentioned. On top of that, I wonder if some mechanism to cache metadata (maybe a direct
CachedRepository?) could be implemented, as all that code is currently tied up to indexes.
I'm also experiencing this issue and it's unfortunate as I now have to choose between installing specific torch = { url = "https://download.pytorch.org/whl/cpu/torch-1.9.0-cp38-none-macosx_11_0_arm64.whl", markers = "platform_machine == 'arm64' and platform_system == 'Darwin'" }
wheels for my architecture that are quicker to install but make dependency resolution super slow, or installing the full torch = "1.9.0"
which takes longer to install but solves the slow resolution times.
It will be great if there was caching for direct URL dependencies as well, as neither option is ideal right now 😞
I'm having the same problem in my project, because if you have any packages with " { url = ... } " poetry add, poetry lock , poetry update, everytime it downloads again. Like a temporary solution, I'm using requirements.txt for URL packages and pyproject.toml for the remaining, waiting for a solution.
I think we've pretty firmly established what is going on and what is needed to improve -- I'd ask that people please refrain from "me too" as it's just adding noise right now.
@neersighted , I'm not just talking like "me too", I was giving a solution that I'm currently using to help others with the same problem. I'm installing in the Dockerfile in the same python env, the poetry dependencies, and the additional URL dependencies with pip. Please refrain to add coments that just make noise in the topic and try to read the comment until the end. Thank you.
I understand you are using tools other than Poetry to avoid our lack of caching here, yes. However, it does not add anything to this issue as it is essentially saying "I work around this deficiency in Poetry by not using Poetry" -- it's rather a tautology.
My comment is also not aimed at you in particular -- it's generalized and often made on high-traffic issues once the problem is defined.
Anyway, I'm going to minimize these comments as unhelpful; if anyone is interested in implementing the strategy @dimbleby/I have mentioned and needs guidance, please feel free to reach out via Discussions/Discord/a draft PR.
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.