setup-emsdk icon indicating copy to clipboard operation
setup-emsdk copied to clipboard

Emscripten sanity check deletes cache during build

Open musicEnfanthen opened this issue 4 years ago • 5 comments

Hi and thanks for this action.

We are encoutering a weird issue, that is both hard to track and to reproduce:

Our use-case: We use setup-emsdk action to install and cache emscripten in a "pre-cache" job, and then restore and activate it from cache in a following job to avoid multiple downloads and installations with matrix builds. Our setup looks like follows:

env:
  EMSCRIPTEN_VERSION: latest
  EMSCRIPTEN_CACHE_FOLDER: emsdk-cache

<snip>
jobs:
setup_emscripten:
    name: Set up and cache emscripten
    runs-on: ubuntu-20.04
    steps:
      - name: Set up cache
        uses: actions/cache@v2
        id: cache
        with:
          path: ${{ env.EMSCRIPTEN_CACHE_FOLDER }}-${{ github.run_id }}
          key: ${{ runner.os }}-emsdk-${{ env.EMSCRIPTEN_VERSION }}-${{ github.run_id }}

      - name: Set up emsdk
        uses: mymindstorm/setup-emsdk@v7
        with:
          version: ${{ env.EMSCRIPTEN_VERSION }}
          actions-cache-folder: ${{ env.EMSCRIPTEN_CACHE_FOLDER }}-${{ github.run_id }}
          no-cache: true

<snip>

build_js:
    runs-on: ubuntu-20.04
    needs: [setup_emscripten]
    strategy:
      matrix:
        toolkit:
          <4 build options>
    steps:
      - name: Checkout main repo
        uses: actions/checkout@v2

      - name: Restore cache
        id: restore_cache
        uses: actions/cache@v2
        with:
          path: ${{ env.EMSCRIPTEN_CACHE_FOLDER }}-${{ github.run_id }}
          key: ${{ runner.os }}-emsdk-${{ env.EMSCRIPTEN_VERSION }}-${{ github.run_id }}

      - name: Set up emsdk (cache not found)
        uses: mymindstorm/setup-emsdk@v7
        if: steps.restore_cache.outputs.cache-hit != 'true'
        with:
          version: ${{ env.EMSCRIPTEN_VERSION }}
          no-cache: true
      - name: Set up emsdk (cache found)
        if: steps.restore_cache.outputs.cache-hit == 'true'
        uses: mymindstorm/setup-emsdk@v7
        with:
          version: ${{ env.EMSCRIPTEN_VERSION }}
          actions-cache-folder: ${{ env.EMSCRIPTEN_CACHE_FOLDER }}-${{ github.run_id }}
          no-cache: true

The issue: With some build runs, it seems that the emscripten config changes during the run and clears the cache (triggered by its sanity check). This seems to happen unpredicatbly, i.e. sometimes it works without problems, sometimes the issue appears.

You can find a failing run in https://github.com/musicEnfanthen/verovio/actions/runs/322146695 . The sanity check info is thrown in: https://github.com/musicEnfanthen/verovio/runs/1292826609?check_suite_focus=true#step:8:55

Is this a known issue? Is there something wrong in our setup?

Thanks in advance.

musicEnfanthen avatar Oct 22 '20 14:10 musicEnfanthen

Hi, thanks for the detailed report! This is not a known issue with the action. I think this is being caused by using latest and caching together. Preferably, you should be declaring versions for builds with the cache and using builds without caching for latest. This can be problematic in some situations e.g. if a cache is detected, the "latest" version will not be downloaded. I should probably have the action error if that type of config is specified.

Other things I noticed:

  • This may be caused by appending run_id to the cache folder, the folder name changing may be causing emscripten to become confused between caches.
  • you don't need to have the if statement, the action will handle no cache folder existing just fine
  • actions-cache-folder implies no-cache

This example config from the README should help:

env:
  EM_VERSION: 1.39.18
  EM_CACHE_FOLDER: 'emsdk-cache'

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Setup cache
        id: cache-system-libraries
        uses: actions/cache@v2
        with:
          path: ${{env.EM_CACHE_FOLDER}}
          key: ${{env.EM_VERSION}}-${{ runner.os }}
      - uses: mymindstorm/setup-emsdk@v7
        with:
          version: ${{env.EM_VERSION}}
          actions-cache-folder: ${{env.EM_CACHE_FOLDER}}
      - name: Build library
        run: make -j2
      - name: Run unit tests
        run: make check

mymindstorm avatar Oct 22 '20 22:10 mymindstorm

Thanks a lot for looking into this and your support!

Tried to set a specific version and to remove the runner_id. And despite the fact, that emscripten still seems to clear the cache, the job does not fail anymore: https://github.com/musicEnfanthen/verovio/runs/1295567024?check_suite_focus=true#step:7:53

The idea behind appending runner_id to the cache folder was to have a fresh emscripten build for every new run of the complete workflow. In the scenario from your README, we will install the specified emscripten version with the first workflow run, and then use it once and for all from cache until we manually change the version number (or cache-folder), right?

For latest: Thanks for the hint, setting a specified version seems indeed to solve the issue (but it is hard to say since it occurs so unpredictably). Just one thought: In a time before gh-actions, we would have installed and activated emscripten from bash like so:

# Fetch the latest registry of available tools.
./emsdk update

# Download and install the latest SDK tools.
./emsdk install latest

# Set up the compiler configuration to point to the "latest" SDK.
./emsdk activate latest

what seems to be recommend by emscripten: https://emscripten.org/docs/tools_reference/emsdk.html#how-do-i-just-get-the-latest-sdk.

Do you think it would be feasable to support latest? Otherwise you would always have to manually update the emscripten version, wouldn't you? But I see your point that it could be hard to detect, if the version coming from cache corresponds with latest or not.

Thanks again.

musicEnfanthen avatar Oct 23 '20 08:10 musicEnfanthen

The idea behind appending runner_id to the cache folder was to have a fresh emscripten build for every new run of the complete workflow. In the scenario from your README, we will install the specified emscripten version with the first workflow run, and then use it once and for all from cache until we manually change the version number (or cache-folder), right?

Sorry, I misrepresented your config and didn't notice that the runner ID was in the cache key. Your original config should have been fine.

Do you think it would be feasable to support latest? Otherwise you would always have to manually update the emscripten version, wouldn't you? But I see your point that it could be hard to detect, if the version coming from cache corresponds with latest or not.

This action is essentially a glorified version of that. Manually using a bash script and caching like you did should give a similar result.

  • Is the emcc -v in setup_emscripten generating a sanity check file that gets saved to the cache?
  • You could disable the sanity checks using EMCC_SKIP_SANITY_CHECK/EM_IGNORE_SANITY or use FROZEN_CACHE in emscripten config (sets cache to read-only, might be too much trouble)
  • Run emcc verbosely, it prints the sanity check results to debug
  • https://github.com/emscripten-core/emscripten/blob/6e2c28717380839dc5e7fdaebe122fca6d1120bb/tools/shared.py#L542-L551

mymindstorm avatar Oct 23 '20 15:10 mymindstorm

Hi, sorry for the late catch up and thanks for your pointers.

We tried various things, but finally sticked with latest version, while removing the runner id from the cache-key. Since then, the issue did not occur anymore.

Thanks again for your great support!

Can be closed.

musicEnfanthen avatar Nov 13 '20 10:11 musicEnfanthen

I'm glad you were able to figure out a solution! I'll keep this open for the time being just in case someone else has the same problem / a definite solution.

mymindstorm avatar Nov 15 '20 21:11 mymindstorm