setup-miniconda icon indicating copy to clipboard operation
setup-miniconda copied to clipboard

Caching a conda environment

Open lminer opened this issue 3 years ago • 12 comments

It would be great if it were possible to cache a conda environment. I see from here that it is possible for a vanilla python environment.

lminer avatar Mar 23 '21 18:03 lminer

It is feasible to do so, but native support is unlikely to be added to this action.

My recommendation would be to:

  • try to restore a actions/cache of a conda-pack archive, hashed off your environment.yml
    • if that hits an empty cache
      • use setup-miniconda to make an environment from the environment.yml
        • make sure it has conda and conda-pack in it
      • use conda-pack to make a relocatable archive of the environment before you do anything to it
        • like install your system-under-test
    • if it succeeds
      • unpack the conda-pack
      • use setup-miniconda with $CONDA set to the unpacked env
      • be fast

This approach avoids a number of gotchas with caching conda tarballs, etc.

bollwyvl avatar Mar 23 '21 20:03 bollwyvl

Do you have any suggestions of an example I might look at for how to do this? My github-actions-fu is quite weak.

lminer avatar Mar 24 '21 00:03 lminer

Here is our example to cache /usr/share/miniconda/envs. Updated

      - uses: conda-incubator/setup-miniconda@v2
        with:
          activate-environment: "xxx"
          auto-activate-base: false
          use-only-tar-bz2: true # IMPORTANT: This needs to be set for caching to work properly!

      - name: Cache conda envs and other stuff
        id: conda
        uses: actions/cache@v2
        env:
          # Increase this value to manually reset cache if setup/environment-linux.yml has not changed
          CONDA_CACHE_NUMBER: 1
        with:
          path: |
            /usr/share/miniconda/envs/xxx
          key: ${{ runner.os }}-conda-${{ env.CONDA_CACHE_NUMBER }}-${{ hashFiles('setup/environment-template.yml', 'setup/*.sh') }}

      - name: Run install script
        # Only need to run install when deps has been changed
        if: steps.conda.outputs.cache-hit != 'true'
        run: |
          ./install     # <----- conda packages are installed here via `conda env update -f ...`

bitphage avatar Mar 30 '21 08:03 bitphage

@bitphage it's failing for me right now at the last step. I don't have a complicated install process, so I just substituted ./install with conda env create -f environment.yml and I got the error:

Could not find conda environment: myenv
You can list all discoverable environments with `conda info --envs`.

This also happens if I do conda env update -f environment.yml. Any idea what I might be doing incorrectly?

lminer avatar Mar 30 '21 16:03 lminer

@lminer hmm, make sure that you have activate-environment: myenv in conda-incubator/setup-miniconda@v2 step.

bitphage avatar Mar 30 '21 16:03 bitphage

@bitphage I have. This is what it looks like:

      - uses: conda-incubator/setup-miniconda@v2
        with:
          activate-environment: "myenv"
          auto-activate-base: false
          use-only-tar-bz2: true # IMPORTANT: This needs to be set for caching to work properly!

      # Remove envs directory if exists to prevent cache restore errors. Github runner already has bundled conda.
      - name: Remove envs directory
        run: rm -rf /usr/share/miniconda/envs

      - name: Cache conda envs and other stuff
        id: conda
        uses: actions/cache@v2
        env:
          # Increase this value to manually reset cache if setup/environment-linux.yml has not changed
          CONDA_CACHE_NUMBER: 1
        with:
          path: |
            ~/conda_pkgs_dir
            /usr/share/miniconda/envs
          key: ${{ runner.os }}-conda-${{ env.CONDA_CACHE_NUMBER }}-${{ hashFiles('environment.yml') }}

      - name: Run install script
        # Only need to run install when deps has been changed
        if: steps.conda.outputs.cache-hit != 'true'
        run: |
          conda env create -f environment.yml

lminer avatar Mar 30 '21 18:03 lminer

@lminer ok, I was trying to fix some issues after recent 2.1.0 release of setup-miniconda action. I've updated the example above. Note that there is no rm -rf step anymore and caching path should be /usr/share/miniconda/envs/myenv to avoid cache restore errors.

bitphage avatar Mar 31 '21 06:03 bitphage

@bitphage thanks! it's working now. Just shaved 4 minutes off the runtime.

lminer avatar Mar 31 '21 18:03 lminer

This example is helpful! I noticed it has one more step than the one in the README - is it recommended that everyone add the "Run install script" step? If so, could the example in the README be updated?

sam-hoffman avatar May 05 '21 19:05 sam-hoffman

My recommendation would be to:

* try to restore a `actions/cache` of a [`conda-pack`](https://conda.github.io/conda-pack/) archive, hashed off your `environment.yml`
  
  * if that hits an empty cache
    
    * use `setup-miniconda` to make an environment from the `environment.yml`
      
      * make sure it has `conda` and `conda-pack` in it
    * use `conda-pack` to make a relocatable archive of the environment _before_ you do anything to it
      
      * like install your system-under-test
  * if it succeeds
    
    * unpack the conda-pack
    * use `setup-miniconda` with `$CONDA` set to the unpacked env
    * be fast

This approach avoids a number of gotchas with caching conda tarballs, etc.

I tried this approach and this is the gist of what I ended up with so far, working:

name: Conda Environment Caching Example
on: workflow_dispatch

env:
  # Increase this value to reset cache if environment.yml has not changed.
  PY_CACHE_NUMBER: 0
  PY_ENV: my_env

jobs:
  setup-python:
    name: Setup Python Environment
    runs-on: ubuntu-latest
    defaults:
      run:
        shell: bash -l {0}
    steps:
      - name: Git checkout
        uses: actions/checkout@v2
      - name: Cache Python environment
        id: cache-python
        uses: actions/cache@v2
        with:
          path: "${{ env.PY_ENV }}.tar.gz"
          key:
            ${{ runner.os }}-${{ env.PY_CACHE_NUMBER }}-${{ hashFiles('**/environment.yml') }}
      - name: Install Python dependencies
        if: steps.cache-python.outputs.cache-hit != 'true'
        uses: conda-incubator/setup-miniconda@v2
        with:
          miniforge-variant: Mambaforge
          use-mamba: true
          auto-update-conda: false
          activate-environment: ${{ env.PY_ENV }}
          environment-file: environment.yml
          auto-activate-base: false
      - name: Pack Python environment
        if: steps.cache-python.outputs.cache-hit != 'true'
        run: |
          conda pack --force -n ${{ env.PY_ENV }}

  use-cached-python:
    name: Use cached Python
    needs: [setup-python]
    runs-on: ubuntu-latest
    defaults:
      run:
        shell: bash -l {0}
    steps:
      - name: Git checkout
        uses: actions/checkout@v2
      - name: Get Python cache
        id: python-cache
        uses: actions/cache@v2
        with:
          path: "${{ env.PY_ENV }}.tar.gz"
          key:
            ${{ runner.os }}-${{ env.PY_CACHE_NUMBER }}-${{ hashFiles('**/environment.yml') }}
      - name: Unpack Python environment
        run: |
          mkdir -p "${{ env.PY_ENV }}"
          tar -xzf "${{ env.PY_ENV }}.tar.gz" -C "${{ env.PY_ENV }}"
          source "${{ env.PY_ENV }}/bin/activate"
          conda-unpack
      - name: Run Python
        run: |
          source "${{ env.PY_ENV }}/bin/activate"
          python -c 'import sys; print(sys.version_info[:])'

In my setup I use different jobs using the same Python environment, that's why I separated the setup from the execution. Using conda-pack you'll have to use the same OS in each job that uses the cache. In my environment.yml I have added conda and conda-pack, and in channels I only have conda-forge.

In the second job I first tried using setup-miniconda after unpacking with:

      - name: Activate Python environment
        uses: conda-incubator/setup-miniconda@v2
        env:
          CONDA: my_env
        with:
          activate-environment: ${{ env.PY_ENV }}
          auto-activate-base: false

but that didn't gave the result I expected. Instead it created a new environment in my_env/envs/my_env. I was looking to not having to source "${{ env.PY_ENV }}/bin/activate" in each step after unpacking.

OlafHaag avatar May 23 '21 17:05 OlafHaag

  • use setup-miniconda with $CONDA set to the unpacked env

I am also trying to setup a GA with setup-miniconda and conda-pack but I don't get that part. Does someone have a quick snippet example?

hadim avatar Jun 13 '21 23:06 hadim

Hi all. I was interested in this as well, and I ended up with this Github Actions workflow based in part on @OlafHaag's, which is run in ubuntu, macOS and Windows. I share it here in case it's useful to others.

name: tests
on:
  push:
  pull_request:
    types: [opened, reopened]

env:
  # Increase this value to reset cache if environment.yml has not changed.
  PY_CACHE_NUMBER: 2
  PY_ENV: cm_gene_expr

jobs:
  pytest:
    name: Python tests
    runs-on: ${{ matrix.os }}
    strategy:
      max-parallel: 4
      fail-fast: false
      matrix:
        python-version: [3.9]
        os: [ubuntu-latest, macOS-latest, windows-latest]
    steps:
      - name: Checkout git repo
        uses: actions/checkout@v2
        with:
          lfs: false
      - name: Cache conda
        id: cache
        uses: actions/cache@v2
        with:
          path: "${{ env.PY_ENV }}.tar.gz"
          key: ${{ runner.os }}-${{ env.PY_CACHE_NUMBER }}-${{ hashFiles('environment/environment.yml') }}
      - name: Setup Miniconda
        if: steps.cache.outputs.cache-hit != 'true'
        uses: conda-incubator/setup-miniconda@v2
        with:
          miniconda-version: "latest"
          auto-update-conda: true
          activate-environment: ${{ env.PY_ENV }}
          channel-priority: strict
          environment-file: environment/environment.yml
          auto-activate-base: false
      - name: Conda-Pack
        if: steps.cache.outputs.cache-hit != 'true'
        shell: bash -l {0}
        run: |
          conda install --yes -c conda-forge conda-pack coverage
          conda pack -f -n ${{ env.PY_ENV }} -o "${{ env.PY_ENV }}.tar.gz"
      - name: Unpack environment
        shell: bash -l {0}
        run: |
          mkdir -p "${{ env.PY_ENV }}"
          tar -xzf "${{ env.PY_ENV }}.tar.gz" -C "${{ env.PY_ENV }}"
      - name: Setup data and run pytest (Windows systems)
        if: runner.os == 'Windows'
        env:
          PYTHONPATH: libs/
        run: |
          ${{ env.PY_ENV }}/python environment/scripts/setup_data.py --mode testing
          ${{ env.PY_ENV }}/python -m pytest -v -rs tests
      - name: Setup data and run pytest (non-Windows systems)
        if: runner.os != 'Windows'
        shell: bash
        env:
          PYTHONPATH: libs/
        run: |
          source ${{ env.PY_ENV }}/bin/activate
          conda-unpack

          python environment/scripts/setup_data.py --mode testing

          if [ "$RUNNER_OS" == "Linux" ]; then
            coverage run --source=libs/ -m pytest -v -rs tests
            coverage xml -o coverage.xml
          else
            pytest -v -rs tests
          fi
      - name: Codecov upload
        if: runner.os == 'Linux'
        uses: codecov/codecov-action@v2
        with:
          files: ./coverage.xml
          name: codecov-${{ matrix.os }}-python${{ matrix.python-version }}
          fail_ci_if_error: true
          verbose: true

miltondp avatar Aug 13 '21 20:08 miltondp