setup-python icon indicating copy to clipboard operation
setup-python copied to clipboard

Self-hosted runners cache/reuse OS-incompatible python installations

Open OscarVanL opened this issue 11 months ago • 6 comments

Description: Our GitHub Actions runners are self-hosted.

Some of our CI jobs run in different operating systems using the Run jobs in a container functionality, for example if they need a specific OS, or are testing on different operating systems in a test matrix.

When running the actions/setup-python action, it will either download Python and cache it, or use the existing cached python installation.

The problem is, while the download URL is an OS-specific download, the cache directory is not OS-specific. For example, these python downloads are different, but both get cached in /opt/actions-runner/_work/_tool/Python/3.8.18/x64.

  • https://github.com/actions/python-versions/releases/download/3.8.18-12303122501/python-3.8.18-linux-24.04-x64.tar.gz
  • https://github.com/actions/python-versions/releases/download/3.8.18-12303122501/python-3.8.18-linux-20.04-x64.tar.gz

The cache destination gets reused across jobs irrespective of what operating system the job is running on, which leads to the python installation breaking.

Example

  • Job 1. Running on Ubuntu 20.04 container

Cache miss, python gets downloaded and cached...

  Version 3.8 was not found in the local cache
  Version 3.8 is available for downloading
  Download from "https://github.com/actions/python-versions/releases/download/3.8.18-12303122501/python-3.8.18-linux-20.04-x64.tar.gz"
  Extract downloaded archive
  /usr/bin/tar xz --warning=no-unknown-keyword --overwrite -C /__w/_temp/bba86c99-5e24-4327-a571-cb48396a8d3a -f /__w/_temp/568681cf-5876-4ee5-b706-acfdd405f2f3
  Execute installation script
  Check if Python hostedtoolcache folder exist...
  Creating Python hostedtoolcache folder...
  Create Python 3.8.18 folder
  Copy Python binaries to hostedtoolcache folder
  Create additional symlinks (Required for the UsePythonVersion Azure Pipelines task and the setup-python GitHub Action)
  Upgrading pip...
  Looking in links: /tmp/tmpmip5jefu
  Requirement already satisfied: setuptools in /__w/_tool/Python/3.8.18/x64/lib/python3.8/site-packages (56.0.0)
  Requirement already satisfied: pip in /__w/_tool/Python/3.8.18/x64/lib/python3.8/site-packages (23.0.1)
  Collecting pip
  Downloading pip-25.0.1-py3-none-any.whl (1.8 MB)
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.8/1.8 MB 9.8 MB/s eta 0:00:00
  
  Installing collected packages: pip
  Attempting uninstall: pip
  Found existing installation: pip 23.0.1
  Uninstalling pip-23.0.1:
  Successfully uninstalled pip-23.0.1
  Successfully installed pip-25.0.1
  Create complete file
  Successfully set up CPython (3.8.18)
  • Job 2. Running in Ubuntu 24.04 container on the same runner

Cache hit (of the incorrect OS's python!):

Run actions/setup-python@v5
  with:
    python-version: 3.8
    check-latest: false
    token: ***
    update-environment: true
    allow-prereleases: false
    freethreaded: false
Installed versions
  Successfully set up CPython (3.8.18)

Run python -m pip install --upgrade pip
  python -m pip install --upgrade pip
  shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
  env:
    pythonLocation: /opt/actions-runner/_work/_tool/Python/3.8.18/x64
    PKG_CONFIG_PATH: /opt/actions-runner/_work/_tool/Python/3.8.18/x64/lib/pkgconfig
    Python_ROOT_DIR: /opt/actions-runner/_work/_tool/Python/3.8.18/x64
    Python2_ROOT_DIR: /opt/actions-runner/_work/_tool/Python/3.8.18/x64
    Python3_ROOT_DIR: /opt/actions-runner/_work/_tool/Python/3.8.18/x64
    LD_LIBRARY_PATH: /opt/actions-runner/_work/_tool/Python/3.8.18/x64/lib
WARNING: pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available.
Requirement already satisfied: pip in /opt/actions-runner/_work/_tool/Python/3.8.18/x64/lib/python3.8/site-packages (25.0.1)
WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError("Can't connect to HTTPS URL because the SSL module is not available.")': /simple/pip/
WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError("Can't connect to HTTPS URL because the SSL module is not available.")': /simple/pip/
WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError("Can't connect to HTTPS URL because the SSL module is not available.")': /simple/pip/
WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError("Can't connect to HTTPS URL because the SSL module is not available.")': /simple/pip/
WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError("Can't connect to HTTPS URL because the SSL module is not available.")': /simple/pip/
Could not fetch URL https://pypi.org/simple/pip/: There was a problem confirming the ssl certificate: HTTPSConnectionPool(host='pypi.org', port=443): Max retries exceeded with url: /simple/pip/ (Caused by SSLError("Can't connect to HTTPS URL because the SSL module is not available.")) - skipping

The above error is likely due Ubuntu 20 and Ubuntu 24 using incompatible/different OpenSSL versions (1 vs 3).

The same can happen in the other direction too, if the Ubuntu 24.04 job runs before the Ubuntu 20.04 job, then I encountered this issue:

Run python -m pip install --upgrade pip
python: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by python)
python: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.38' not found (required by /__w/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0)
python: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.35' not found (required by /__w/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0)
python: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.38' not found (required by /__w/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0)
python: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by /__w/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0)
python: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by /__w/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0)
python: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /__w/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0)

In this case, because the python release for Ubuntu 24.04 is dynamically linked against a glibc version that isn't installed on Ubuntu 20.04.

Action version: actions/setup-python@v5

Platform:

  • [x] Ubuntu
  • [ ] macOS
  • [ ] Windows

Runner type:

  • [ ] Hosted
  • [x] Self-hosted

Tools version: I expect all python versions may be affected

Repro steps:

  1. Create a single self-hosted Github Actions runner
  2. Run a CI job that is using Ubuntu 24.04 i. Run actions/setup-python@v5 targeting python 3.8
  3. Wait for above job to finish
  4. Run a CI job that is using Ubuntu 20.04. i. Run actions/setup-python@v5 targeting python 3.8 ii. Run python -m pip install --upgrade pip

The above should fail. The python installation is broken as it is incompatible with that Ubuntu version.

The problem can be reproduced if you swap the operating systems around too (Ubuntu 20 first, then Ubuntu 24), but a different python error will happen.

Expected behavior:

If an OS-specific python release is utilised in the download, then this should be cached into an OS-specific location on the CI runner to ensure an equivalent cached python release is utilised.

Actual behavior: In a multi-OS environment using actions/setup-python it is likely GitHub Actions will cache an incompatible python installation.

OscarVanL avatar Apr 17 '25 13:04 OscarVanL

Hello @OscarVanL, Thank you for creating this issue and we will look into it :)

aparnajyothi-y avatar Apr 18 '25 06:04 aparnajyothi-y

@chiranjib-swain I see you've written some test cases that look to try to reproduce the issue I described (here and here).

Were you able to successfully reproduce this bug? Is there any extra information you would like from my side?

Thanks!

OscarVanL avatar Apr 23 '25 12:04 OscarVanL

Hi @OscarVanL ,

Thanks for reaching out!

Were you able to successfully reproduce this bug? Is there any extra information you would like from my side?

Yes, I have already reproduced it and am sharing the screenshots below.

Image

Image

I've tested the workflow setup using the AGENT_TOOLSDIRECTORY environment variable as per the documentation on both Ubuntu 20.04 and 24.04, and it's working as expected.

The AGENT_TOOLSDIRECTORY environment variable is dynamically set before the actions/setup-python step to ensure that the tool cache is properly isolated per OS version. This is especially important when using self-hosted runners with containers and switching between environments like Ubuntu 20.04 and 24.04.

Key Points for Setup

1. Dynamic AGENT_TOOLSDIRECTORY Configuration

Define a unique tool cache path for each OS early in your workflow to prevent conflicts:

   - name: Set AGENT_TOOLSDIRECTORY
        run: echo "AGENT_TOOLSDIRECTORY=${{ runner.temp }}/tools-ubuntu-24-04" >>   $GITHUB_ENV

Make sure this step is run before the actions/setup-python step to ensure the tool cache is used correctly during setup.

2. Permission Setup

Your runner must have write access to the directory specified by AGENT_TOOLSDIRECTORY.

3. Workflow Compatibility

I've tested this configuration with actions/setup-python on both OS versions, and all workflows ran successfully without tool cache issues. Please find attached screenshots of my run below for your reference.

Image

Image

Let me know if you have any other issues or need more help!

chiranjib-swain avatar Apr 24 '25 03:04 chiranjib-swain

Hi @chiranjib-swain,

Thanks for the reply and the workaround, this should help me get unblocked for now.

Long term, perhaps it is worth making some changes to the directory that actions/setup-python uses to cache the toolchain.

For example, instead of storing in /opt/actions-runner/_work/_tool/Python/3.8.18/x64, it makes more sense to store in a path more similar to that used for the download link, like /opt/actions-runner/_work/_tool/Python/3.8.18-linux-24.04-x64.

This would be a long term fix that would avoid the need to assign the AGENT_TOOLSDIRECTORY on every job.

Thanks again!

OscarVanL avatar Apr 24 '25 06:04 OscarVanL

Hi @OscarVanL ,

Implementing OS-specific identifiers in the toolchain cache subdirectory structure (e.g., changing /Python/3.8.18/x64 to /Python/3.8.18-linux-20.04/x64) is a good suggestion to avoid conflicts between Python installations and dependencies across different operating systems. Although this is a breaking change that may initially result in cache misses, it can be treated as a feature request for future planning once we gather some feedback.

Thank You!

chiranjib-swain avatar Apr 29 '25 05:04 chiranjib-swain

@chiranjib-swain

This issue has just happened again for a colleague of mine in a different team completely independently.

I personally disagree with the view that this is a feature request, in my opinion this is a design flaw in the caching strategy used by actions/setup-python.

I also disagree that it is a breaking change to fix this. At worst, it will result in a single cache-miss for existing setups, which I think is more than justified by the upsides and correctness.

If you insist it is a feature request, please can you ask for this feature request to be considered?

OscarVanL avatar Oct 15 '25 12:10 OscarVanL