setup-python
setup-python copied to clipboard
PIP cache should cache the installed packages as well
Description:
Currently, setup-python
caches only the ~/.cache/pip
directory to avoid redownloads. However, it doesn't cache the installed packages. As some package have lengthy installation steps, this leads to delays in builds.
You can see the current behaviour for example in https://github.com/crabhi/setup-python-cache-test/actions/runs/1789016634 (or in attached build.txt) - the pip install
output shows "Collecting" and "Installing" instead of "Requirement already satisfied" for all packages.
Justification:
For example installing the ansible
package takes well over a minute even if it's already downloaded.
Are you willing to submit a PR? Yes, I can try.
Hello @crabhi, thanks for your request! We will look at it.
would be also nice to follow https://github.com/actions/cache#outputs and provide an output cache-hit
so we can if: steps.[id]. cache-hit != 'true'
to avoid calling pip
altogether.
which is done at https://github.com/actions/cache/blob/2d8d0d1c9b41812b6fd3d4ae064360e7d8762c7b/src/utils/actionUtils.ts#L25-L27 and https://github.com/actions/cache/blob/main/src/restore.ts#L55
This pattern affects more languages (actions/setup-node
works the same - only caches downloads, not installs) - would love to see a general consensus towards caching installs, not tarballs (perhaps behind a flag/attribute for future compat cache: pip-install
).
for pip
I think it makes even more sense as there are no postinstall
actions... with setup-node
I'm also caching the node_modules
instead of the packages, but it "broke" some flows where there was a postinstall
script to configure other things (like pre-build some typescript scripts). The solution is simple, just run that script manually (or in my case, cache the built scripts)... but not "one size fits all".
For pip AFAIR there are no postinstall scripts, then this would not be an issue.
For pip AFAIR there are no postinstall scripts, then this would not be an issue.
I'm experimenting with this at the moment and caching site-packages (read: pip output) isn't straightforward either; for instance binary wrappers (black, ..) won't work (python -m black
works fine tho). Might be one of thos YMMV cases that makes it hard to standardize for everyone.
would be also nice to follow https://github.com/actions/cache#outputs and provide an output
cache-hit
so we canif: steps.[id]. cache-hit != 'true'
to avoid callingpip
altogether.
Hey, this feature was merged today and should be a part of the near-future release
but the cache-hit
is just for the packages, not the installation, right? IOW: do I still need to call pip install
?
but the
cache-hit
is just for the packages, not the installation, right? IOW: do I still need to callpip install
?
Oh you're talking pip
. Well yeah, then you'll have to wait for this action to support caching venv's out of the box. It's a case for pipenv and poetry though. The best this for now is to manually cache
I have a case where building packages for pypy (grpcio, grpcio-tools) takes about 6 minutes-- it's way too slow to introduce a matrix.
If anyone has a manual example using actions/cache, please share it.
I was creating a python venv and then caching that directory, however I hit an issue where that was broken once restored (behaviour was inconsistent).
I currently have a job that takes ~6 min to complete, 4 min of which is installation of pip packages. An effective caching of installed packages would be a great boost.
I was creating a python venv and then caching that directory, however I hit an issue where that was broken once restored (behaviour was inconsistent).
I currently have a job that takes ~6 min to complete, 4 min of which is installation of pip packages. An effective caching of installed packages would be a great boost.
Could you share the workflow so the people can take a look at it? I think it's possible to hack around while this feature is not here
I was creating a python venv and then caching that directory, however I hit an issue where that was broken once restored (behaviour was inconsistent). I currently have a job that takes ~6 min to complete, 4 min of which is installation of pip packages. An effective caching of installed packages would be a great boost.
Could you share the workflow so the people can take a look at it? I think it's possible to hack around while this feature is not here
- uses: actions/checkout@v3
- id: setup_python
uses: actions/setup-python@v3
with:
python-version: 3.7
- id: python_cache
uses: actions/cache@v3
with:
path: venv
key: pip-${{ steps.setup_python.outputs.python-version }}-${{ hashFiles('requirements.txt') }}
- if: steps.python_cache.outputs.cache-hit != 'true'
run: |
python3 -m venv venv
- run: |
venv/bin/python3 -m pip install -r requirements.txt
This worked quite well for me for the most part, just that after a while I started getting errors as such:
Error: [Errno 2] No such file or directory: '/home/runner/work/myrepo/myrepo/venv/bin/python3': '/home/runner/work/myrepo/myrepo/venv/bin/python3'
@rashidnhm have you tried debugging this issue? It seems like the problem may be not in this action.
@rashidnhm have you tried debugging this issue? It seems like the problem may be not in this action.
So weirdly enough, I have not been able to reproduce the issue. To fix I simply removed the venv code and recreated and re cached it. I'm not even sure what caused it in the first place.
My only thought was maybe somehow the cach got corrupted and it kept restoring that. Really can't say.
For now I've kept the code I sent above, it's been working well since and haven't hit any other issues
Ok, nice. The code seemed ok, so that was strange. I'd only advise you to may be not run pip install if cache was hit implying you don't want to modify cache in any way if it's hit to avoid corruption
Ok, nice. The code seemed ok, so that was strange. I'd only advise you to may be not run pip install if cache was hit implying you don't want to modify cache in any way if it's hit to avoid corruption
So I have done quite a deep dive into the venv corruption issue, and I believe I know what happened, and how to avoid it as well.
The version of Python between when my cache was created and when it was restored changed. And I had a generic restore key which matched the old cache key. See detailed explanation below.
This is how I had my yaml file was when I hit this error:
# BAD CONFIG DO NOT USE (Illustrative purposes only)
- uses: actions/checkout@v3
- id: setup_python
uses: actions/setup-python@v3
with:
python-version: 3.7
- id: python_cache
uses: actions/cache@v3
with:
path: venv
key: pip-${{ steps.setup_python.outputs.python-version }}-${{ hashFiles('requirements.txt') }}
restore-keys: |
pip-${{ steps.setup_python.outputs.python-version }}-
pip- # This line in specific was the cause of the issue
- if: steps.python_cache.outputs.cache-hit != 'true'
run: |
python3 -m venv venv
- run: |
venv/bin/python3 -m pip install -r requirements.txt
When this workflow initially ran and saved the venv to cache, the latest release of Python3.7 was 3.7.12 ... meaning the venv created had symlinks to 3.7.12.
However, few days later when the workflow ran again, the latest release of Python3.7 was 3.7.13.
Notice in my workflow I don't pin my Python patch version, so actions/setup-python
downloaded the latest available patch release of Python 3.7 (as expected).
However, my restore-key pip-
matched the old cache, which restored the old venv created for Python 3.7.12 ... meaning all the symlinks inside were now broken! I have setup Python 3.7.13 but am trying to use a venv with symlinks to 3.7.12! Hence why when I tried to call the python executable from the venv, it could not find the file!
The resolution is to really ensure that the output of setup python is always part of the cache key. So any change in python version (even a patch version bump) would create a new cache key.
This is the code I have now, it has been working well without any issues. I have updated the workflow with the advice @dhvcc gave in the above comment. The venv is not touched if there is a cache hit.
- uses: actions/checkout@v3
- id: setup_python
uses: actions/setup-python@v3
with:
python-version: 3.7
- id: python_cache
uses: actions/cache@v3
with:
path: venv
key: pip-${{ steps.setup_python.outputs.python-version }}-${{ hashFiles('requirements.txt') }}
- if: steps.python_cache.outputs.cache-hit != 'true'
run: |
# Check if venv exists (restored from secondary keys if any, and delete)
# You might not need this line if you only have one primary key for the venv caching
# I kept it in my code as a fail-safe
if [ -d "venv" ]; then rm -rf venv; fi
# Re-create the venv
python3 -m venv venv
# Install dependencies
venv/bin/python3 -m pip install -r requirements.txt
Hi, @rashidnhm 👋 Thanks a lot for such a detailed explanation, it should help others who encountered such issues.
Any news on how to flag to cache the installed packages, and not only the downloaded ones, with actions/setup-python@v4
? I am not seeing any flags for that in the documentation
Any news on how to flag to cache the installed packages, and not only the downloaded ones, with
actions/setup-python@v4
? I am not seeing any flags for that in the documentation
What do you exactly mean by that? A bit more context would be helpful to avoid misunderstandings
Sorry, @dhvcc if I didn't manage to make myself clear. actions/setup-python@v4
uses actions/cache@v3
under the hood and users do not need to call on the actions/cache@v3 module
in an example such as:
- uses: actions/checkout@v3
- name: Set up Python 3.10 and caches
id: setup and cache
uses: actions/setup-python@v4
with:
python-version: '3.10'
cache: 'pip'
It would be great if the installed packages could be cached as well (the purpose of this issue #330) through actions/setup-python@v4
I wonder if it's actually worth it.
Here I cached the content of ${{ env.pythonLocation }}/lib/site-packages
and ${{ env.pythonLocation }}/Scripts
using actions/cache
:
No caching:
Caching, no cache hit (+1m):
Caching, cache hit (+18s):
@Avasam possibly at least less strain on pypi. Also we should test small and big amounts of dependencies
Just wanted to add an anecdote of my own experience. TorchGeo has a long list of dependencies:
Install times without caching vary quite a bit by OS and Python version:
Python | Linux | macOS | Windows |
---|---|---|---|
3.10 | 2m 30s | 2m 23s | 5m 4s |
3.9 | 2m 50s | 4m 50s | 5m 49s |
3.8 | 2m 29s | 2m 12s | 3m 19s |
We first tried using the cache feature of setup-python:
- name: Set up python
uses: actions/[email protected]
with:
python-version: ${{ matrix.python-version }}
cache: 'pip'
cache-dependency-path: |
requirements/required.txt
requirements/datasets.txt
requirements/tests.txt
- name: Install pip dependencies
run: pip install -r requirements/required.txt -r requirements/datasets.txt -r requirements/tests.txt
Not only do install times not significantly improve, in many cases it's actually worse!
Python | Linux | macOS | Windows |
---|---|---|---|
3.10 | 2m 42s | 1m 53s | 5m 50s |
3.9 | 2m 50s | 2m 11s | 5m 46s |
3.8 | 2m 39s | 3m 21s | 2m 35s |
Finally, we tried the setup proposed in this blog that manually caches the entire Python installation:
- name: Set up python
uses: actions/[email protected]
with:
python-version: ${{ matrix.python-version }}
- name: Cache dependencies
uses: actions/[email protected]
id: cache
with:
path: ${{ env.pythonLocation }}
key: ${{ env.pythonLocation }}-${{ hashFiles('requirements/required.txt') }}-${{ hashFiles('requirements/datasets.txt') }}-${{ hashFiles('requirements/tests.txt') }}
- name: Install pip dependencies
if: steps.cache.outputs.cache-hit != 'true'
run: pip install -r requirements/required.txt -r requirements/datasets.txt -r requirements/tests.txt
This resulted in significantly faster installation times, which could likely be further improved by only caching the site-packages directory:
Python | Linux | macOS | Windows |
---|---|---|---|
3.10 | 38s | 39s | 4m 11s |
3.9 | 53s | 45s | 4m 20s |
3.8 | 1m 2s | 1m 15s | 1m 21s |
Apparently slower Windows caching is a known issue: https://github.com/actions/cache/issues/752.
So yes, if setup-python also cached installed packages, that would be awesome!
which could likely be further improved by only caching the site-packages directory
In hindsight, this is a bad idea, many tools like black or flake8 also install files into bin so we'll at least need to cache bin too.
In hindsight, this is a bad idea, many tools like black or flake8 also install files into bin so we'll at least need to cache bin too.
I addressed this point a while ago (above) - recap here:
I'm experimenting with this at the moment and caching site-packages (read: pip output) isn't straightforward either; for instance binary wrappers (black, ..) won't work (
python -m black
works fine tho). Might be one of thos YMMV cases that makes it hard to standardize for everyone.
So, instead of invoking black
, do python -m black
.
That's a decent workaround, but I don't think it's realistic to expect all users to change how they invoke other steps later in their workflow. I think we would have to cache bin too. Possibly everything. Bonus of caching everything is that we have to install Python from a cache anyway.
That's a decent workaround, but I don't think it's realistic to expect all users to change how they invoke other steps later in their workflow. I think we would have to cache bin too. Possibly everything. Bonus of caching everything is that we have to install Python from a cache anyway.
Most definitely not a catch-all! To be honest I'm not confident there's a straightforward solution..
The workaround from @adamjstewart seems to work wonders indeed ! But I think a standard implementation from this repository would be a great addition. Any updates on it from the dev team ?
Just wanted to add an anecdote of my own experience. TorchGeo has a long list of dependencies:
Install times without caching vary quite a bit by OS and Python version:
Python Linux macOS Windows 3.10 2m 30s 2m 23s 5m 4s 3.9 2m 50s 4m 50s 5m 49s 3.8 2m 29s 2m 12s 3m 19s We first tried using the cache feature of setup-python:
- name: Set up python uses: actions/[email protected] with: python-version: ${{ matrix.python-version }} cache: 'pip' cache-dependency-path: | requirements/required.txt requirements/datasets.txt requirements/tests.txt - name: Install pip dependencies run: pip install -r requirements/required.txt -r requirements/datasets.txt -r requirements/tests.txt
Not only do install times not significantly improve, in many cases it's actually worse!
Python Linux macOS Windows 3.10 2m 42s 1m 53s 5m 50s 3.9 2m 50s 2m 11s 5m 46s 3.8 2m 39s 3m 21s 2m 35s Finally, we tried the setup proposed in this blog that manually caches the entire Python installation:
- name: Set up python uses: actions/[email protected] with: python-version: ${{ matrix.python-version }} - name: Cache dependencies uses: actions/[email protected] id: cache with: path: ${{ env.pythonLocation }} key: ${{ env.pythonLocation }}-${{ hashFiles('requirements/required.txt') }}-${{ hashFiles('requirements/datasets.txt') }}-${{ hashFiles('requirements/tests.txt') }} - name: Install pip dependencies if: steps.cache.outputs.cache-hit != 'true' run: pip install -r requirements/required.txt -r requirements/datasets.txt -r requirements/tests.txt
This resulted in significantly faster installation times, which could likely be further improved by only caching the site-packages directory:
Python Linux macOS Windows 3.10 38s 39s 4m 11s 3.9 53s 45s 4m 20s 3.8 1m 2s 1m 15s 1m 21s Apparently slower Windows caching is a known issue: actions/cache#752.
So yes, if setup-python also cached installed packages, that would be awesome!
This right here has been a life saver for me - I toiled over this caching for so long, but this got me there!! Thank you so so so much!!