poetry Poetry downloading same wheels multiple times within a single invocation

[x] I am on the latest Poetry version.
[x] I have searched the issues of this repo and believe that this is not a duplicate.
[ ] If an exception occurs when executing a command, I executed it again in debug mode (-vvv option).

OS version and name: macOS 10.14.6
Poetry version: 1.0.5
Link of a Gist with the contents of your pyproject.toml file: https://gist.github.com/bb/501f33ad3f35eb8c26ce2513ca6074c8

Issue

When adding a new dependency, it is downloaded multiple times; I observed three downloads, two of those are unneccessary.

Starting with a pyproject.toml as in the Gist given above, I run

poetry add https://github.com/oroszgy/spacy-hungarian-models/releases/download/hu_core_ud_lg-0.3.1/hu_core_ud_lg-0.3.1-py3-none-any.whl

Then I see the following output (XXX added as markers for explanation below):

Updating dependencies XXX
Resolving dependencies... (276.1s)

Writing lock file
XXX

Package operations: 0 installs, 7 updates, 0 removals

  - Updating certifi (2019.11.28 -> 2020.4.5.1)
  - Updating urllib3 (1.25.8 -> 1.25.9)
  - Updating asgiref (3.2.3 -> 3.2.7)
  - Updating pytz (2019.3 -> 2020.1)
  - Updating django (3.0.4 -> 3.0.6)
  - Updating hu-core-ud-lg (0.3.1 -> 0.3.1 https://github.com/oroszgy/spacy-hungarian-models/releases/download/hu_core_ud_lg-0.3.1/hu_core_ud_lg-0.3.1-py3-none-any.whl)
XXX  - Updating psycopg2-binary (2.8.4 -> 2.8.5)

At the positions where the marker XXX is inserted, the same 1.3GB download is done again and again.

Similar, when adding another package later, again XXX marks the cursor position when the big download is done:

$ poetry add djangorestframework

Using version ^3.11.0 for djangorestframework

Updating dependencies
Resolving dependencies... (0.4s)

Writing lock file
XXX

Package operations: 1 install, 1 update, 0 removals

  - Installing djangorestframework (3.11.0)
  - Updating hu-core-ud-lg (0.3.1 -> 0.3.1 https://github.com/oroszgy/spacy-hungarian-models/releases/download/hu_core_ud_lg-0.3.1/hu_core_ud_lg-0.3.1-py3-none-any.whl)
XXX

I'd expect the file to be downloaded a most once and reused.

Slightly related but different issues: #999, #2094

May 13 '20 22:05 bb

The first two downloads happen in https://github.com/python-poetry/poetry/blob/41a8a470ec693b90f1353c37d60b398711bf2f29/src/poetry/puzzle/provider.py#L413-L418

which indeed use a temporary location that is immediately thrown away.

Presumably the right thing to share with would be the artifact cache as used by the Chef?

Jun 18 '22 13:06 dimbleby

What's the reasoning for dumping the downloads to a temp_dir as @dimbleby shows in the snippet? Is it so the cache doesn't blow out to a massive size?

I'd be happy to try and contribute. Naively I'd check a cache wherever download_file is called (puzzle/provider.py and repositories/http.py) but there are likely some considerations I'm missing. If someone could advise I could put together a PR.

Sep 15 '22 01:09 tall-josh

Suspect that code fragment uses a temporary directory for no particularly good reason.

poetry has a cache of downloaded files that it uses during installation, as managed by the curiously named Chef class. I'd think that is the right thing to share with.

Couple of problems though:

the chef uses such things as the current interpreter version to decide where to put these file, which an unwanted complication
it's not entirely clear how to refactor to make the chef cache available during solving

I'd start with an MR that updates the chef so that get_cache_directory_for_link only cares about the URL that the link is downloaded from - that should be straightforward, and will get maintainer opinion on whether this is a sensible track.

Then if that's accepted, follow up with some sort of rearrangement so that this cache can be shared by the chef and the solving code

Sep 15 '22 07:09 dimbleby

Thanks @dimbleby I'll take a look and see what I can do.

Sep 15 '22 21:09 tall-josh

This is a serious problem with packages like PyTorch which are extremely large. Unless there's a workaround for this I will definitely never use Poetry.

Sep 30 '22 13:09 dhdaines

Any update on a fix for this? I really like poetry but locking or adding a new dependency now takes > 5 minutes because I have to download wheels for torch, torchaudio, and torchvision. Is there a short-term workaround while a more permanent fix is made? Thank you.

Oct 12 '22 17:10 rbracco

I suspect many are reading this issue without actually having experienced the issue -- Poetry downloads Torch once for metadata + hashing, and a second time for actual installation. After the cache is created, Poetry will not re-download Torch. We are downloading distfiles more often than needed as two parts of the code do not share a common cache, but we are not downloading every time poetry add occurs or anything similar.

Oct 12 '22 19:10 neersighted

I suspect many are reading this issue without actually having experienced the issue -- Poetry downloads Torch once for metadata + hashing, and a second time for actual installation. After the cache is created, Poetry will not re-download Torch. We are downloading distfiles more often than needed as two parts of the code do not share a common cache, but we are not downloading every time poetry add occurs or anything similar.

Thanks for the reply. I am an active user of poetry running 1.2.1, I experience the issue as the pytorch wheel downloads every time I do add or lock and it takes around 80 seconds to download.

https://user-images.githubusercontent.com/47190785/195437531-0f111694-2fab-4f54-a8d8-73c46fcc9961.mp4

Oct 12 '22 20:10 rbracco

Every time I run poetry update in my project, a large spaCy model gets download.

It is added to [tool.poetry.dependencies] this way:

en_core_web_lg = { url = "https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz" }`

Oct 13 '22 15:10 chopeen

I think this is related: in a project I have these conditional URL dependencies defined

torch = [
    {url = "https://download.pytorch.org/whl/cu113/torch-1.12.1%2Bcu113-cp39-cp39-linux_x86_64.whl", markers = "sys_platform == 'linux'"},
    {url = "https://download.pytorch.org/whl/cpu/torch-1.12.1-cp39-none-macosx_10_9_x86_64.whl", markers = "sys_platform == 'darwin' and platform_machine == 'x86_64'"},
    {url = "https://download.pytorch.org/whl/cpu/torch-1.12.1-cp39-none-macosx_11_0_arm64.whl", markers = "sys_platform == 'darwin' and platform_machine == 'arm64'"}
]

Every poetry lock operation ends up redownloading the 3 wheels, which are quite large. Isn't there a way to have them cached by poetry?

Oct 20 '22 07:10 nicolascedilnik

I think this is related: in a project I have these conditional URL dependencies defined
torch = [
    {url = "https://download.pytorch.org/whl/cu113/torch-1.12.1%2Bcu113-cp39-cp39-linux_x86_64.whl", markers = "sys_platform == 'linux'"},
    {url = "https://download.pytorch.org/whl/cpu/torch-1.12.1-cp39-none-macosx_10_9_x86_64.whl", markers = "sys_platform == 'darwin' and platform_machine == 'x86_64'"},
    {url = "https://download.pytorch.org/whl/cpu/torch-1.12.1-cp39-none-macosx_11_0_arm64.whl", markers = "sys_platform == 'darwin' and platform_machine == 'arm64'"}
]
Every poetry lock operation ends up redownloading the 3 wheels, which are quite large. Isn't there a way to have them cached by poetry?

On my system also, this seemed to make Poetry re-download torch every time it resolved dependencies. It did not happen with other dependencies that were given by name (to be downloaded from PyPI, no URL).

Since PyTorch URLs have to be hard-coded to install properly and PyTorch's wheel takes more than 1GB, this prevents me from migrating the team to Poetry.

Oct 24 '22 15:10 a-gn

Ah, looking at this, I realize that all the metadata caching happens in the repository layer. So if you're using direct URL dependencies, Poetry has no caching whatsoever. I personally got turned around here on whether this was a bug or as-designed behavior (currently, the latter is true).

Ideally the artifacts cache could be made agonostic to repositories so that it is keyed on URLs only and we can share it, as @dimbleby has mentioned. On top of that, I wonder if some mechanism to cache metadata (maybe a direct CachedRepository?) could be implemented, as all that code is currently tied up to indexes.

Oct 24 '22 16:10 neersighted

I'm also experiencing this issue and it's unfortunate as I now have to choose between installing specific torch = { url = "https://download.pytorch.org/whl/cpu/torch-1.9.0-cp38-none-macosx_11_0_arm64.whl", markers = "platform_machine == 'arm64' and platform_system == 'Darwin'" } wheels for my architecture that are quicker to install but make dependency resolution super slow, or installing the full torch = "1.9.0" which takes longer to install but solves the slow resolution times.

It will be great if there was caching for direct URL dependencies as well, as neither option is ideal right now 😞

Nov 10 '22 22:11 jace-ys

I'm having the same problem in my project, because if you have any packages with " { url = ... } " poetry add, poetry lock , poetry update, everytime it downloads again. Like a temporary solution, I'm using requirements.txt for URL packages and pyproject.toml for the remaining, waiting for a solution.

Nov 25 '22 18:11 leoitcode

I think we've pretty firmly established what is going on and what is needed to improve -- I'd ask that people please refrain from "me too" as it's just adding noise right now.

Nov 25 '22 18:11 neersighted

@neersighted , I'm not just talking like "me too", I was giving a solution that I'm currently using to help others with the same problem. I'm installing in the Dockerfile in the same python env, the poetry dependencies, and the additional URL dependencies with pip. Please refrain to add coments that just make noise in the topic and try to read the comment until the end. Thank you.

Nov 25 '22 18:11 leoitcode

I understand you are using tools other than Poetry to avoid our lack of caching here, yes. However, it does not add anything to this issue as it is essentially saying "I work around this deficiency in Poetry by not using Poetry" -- it's rather a tautology.

My comment is also not aimed at you in particular -- it's generalized and often made on high-traffic issues once the problem is defined.

Anyway, I'm going to minimize these comments as unhelpful; if anyone is interested in implementing the strategy @dimbleby/I have mentioned and needs guidance, please feel free to reach out via Discussions/Discord/a draft PR.

Nov 25 '22 19:11 neersighted

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Feb 29 '24 14:02 github-actions[bot]

poetry poetry copied to clipboard

Poetry downloading same wheels multiple times within a single invocation

Issue

poetry
poetry copied to clipboard