setup-python icon indicating copy to clipboard operation
setup-python copied to clipboard

Caching conflicts when using `extra` dependencies

Open Ben-Epstein opened this issue 1 year ago • 2 comments

Description: This may relate to #626, and it may also conflict with your stated anti-goals, but i believe it's worth bringing to the surface as a potential bug you may want to investigate, as I can't find a direct issue around it, and it may be impacting many users who rely on this action.

If you build different extras with your python project, each containing their own independent dependencies, and you want to test to ensure that each extra has all of its necessary dependencies in a job, while also checking overall lint/type safety/testing, you may run into this issue as I have.

When you specify the cache cache: poetry or cache: pip etc, and point to your requirements.txt or more up to date pyproject.toml, the cache key doens't take into account what you are installing in that job.

So, if I have a pyproject like so

...
[tool.poetry.dependencies]
python = ">=3.10.9,<3.11"
numpy = "^1.22.3"
boto3 = "^1.24.59"
pydantic = {version = "<2.0", extras = ["dotenv"]}
jinja2 = "^3.1.2"

openai = {version = "0.28", optional = true}


[tool.poetry.extras]
openai = ["openai"]

And in my first job, i use setup-python and then run

poetry install --all-extras

but in another job, I run

poetry install

One may assume that openai will not be installed in the second job. But if i'm using caching, regardless of what I install with, everything from the first cache creation will be installed.

I would think that the install command itself would generate the hash, rather than the dependency file itself.

Based on your non-goals, I understand if this isn't something you want to pursue, but it might be worth documenting in a overly-clear way for users who may not understand this behavior upfront.

Thank you!

Ben-Epstein avatar Apr 02 '24 14:04 Ben-Epstein

Hello @Ben-Epstein , I have attempted to reproduce the issue on my end, but was unable to do so. In my test environment, the extras(openai) are not installed in the second job that use poetry install. Here's a screenshot for your reference. Could you assist by sharing a link to a simplified version that reproduces the problem? Thank you! Screenshot 2024-04-18 at 5 36 01 PM Screenshot 2024-04-18 at 5 35 14 PM

gowridurgad avatar Apr 19 '24 05:04 gowridurgad

Hello @Ben-Epstein Just a gentle reminder!

gowridurgad avatar May 03 '24 12:05 gowridurgad

Hi @Ben-Epstein, Could you please assist by sharing a link to a simplified version that reproduces the problem? Thank you!

gowridurgad avatar May 15 '24 11:05 gowridurgad

Hi @gowridurgad sorry about that. I will take a look today to reproduce. Did you use poetry in that example? My project uses poetry so I'll try that.

Ben-Epstein avatar May 15 '24 11:05 Ben-Epstein

Hi @gowridurgad I'm so sorry for the delay in the response.

I've reproduced the issue and shared it in this PR https://github.com/Ben-Epstein/poetry-setup-python-bug/pull/2

Here are the critical steps to reproduce:

  1. Kick off a job that installs all dependencies through poetry (ie poetry install --all-extras)
  2. After the cache from that job is created, then change the install to be poetry install without the extras, but you'll see that there is a cache hit and packages you do not expect to be installed are in fact installed.

You can see steps 1 and 2 in the following commits:

  1. this commit creates the cache, and in the repo there is only 1 cache.
  2. this commit then allows the second job to run, and in the corresponding action you can see that it picked up the cache created from (1). You can see in the second step poetry run pip list that there are all of the extra dependencies that were installed from the commit in (1) when running poetry install --all-extras that shouldn't be there, since we are running poetry install. IE, there should have been a cache miss.

Ben-Epstein avatar May 19 '24 22:05 Ben-Epstein

Hi @Ben-Epstein , The reason for this behavior is that the cache key didn't change between the two jobs and the caching mechanism is designed to reuse the cache if it finds one with the same key. To avoid this situation, you might consider using different cache keys for jobs with different requirements using actions/cache. Here is the screenshot for your reference. we will update the document accordingly. Screenshot 2024-05-21 at 5 21 53 PM Screenshot 2024-05-21 at 5 22 19 PM

gowridurgad avatar Jun 03 '24 05:06 gowridurgad

Hello @Ben-Epstein , The PR has been merged and the Anti-Goals for caching poetry dependencies are updated in the document . For reference, you may visit the https://github.com/actions/setup-python/blob/main/docs/adrs/0000-caching-dependencies.md. Thank You !

gowridurgad avatar Jul 11 '24 12:07 gowridurgad