core icon indicating copy to clipboard operation
core copied to clipboard

high usage of github hosted runners

Open cwsmith opened this issue 5 months ago • 9 comments

Our current matrix of configuration options has increased our github hosted runner usage from about 24 minutes (https://github.com/SCOREC/core/actions/runs/13558913948/usage) three months ago to 2 hours and 10 minutes (https://github.com/SCOREC/core/actions/runs/15589036256/usage).

A couple of thoughts:

  • do we need all these combinations?
  • are we caching dependencies?
  • would switching to ninja (from make) help? (thx @bobpaw)
  • should we move the majority of these tests to a self-hosted runner with a manual trigger (i.e., /runtests)? If we run on the fast filesystem and use more cpu cores for the builds and tests we may come out ahead (vs the elapsed github hosted time of ~10mins). If we take this approach then the automatic tests could be reduced to a small subset.

cwsmith avatar Jun 11 '25 16:06 cwsmith

Using ccache will also help

bobpaw avatar Jun 11 '25 16:06 bobpaw

cool! https://ccache.dev/ Assuming each matrix entry/combination is running in its own vm instance, I wonder if a 'central' ccache could be used.

cwsmith avatar Jun 11 '25 16:06 cwsmith

It's been a few years since I used the GitHub actions ccache features, but I think the usual approach is to load the cache as an artifact before building and write it afterward. I'm sure there's a good guide out there.

bobpaw avatar Jun 11 '25 16:06 bobpaw

We can also make better use of the on.*.paths/on.*.paths-ignore key to avoid running CI for documentation-only updates. The only thing there is to make sure that .github/workflows is in there so that changes to the action itself triggers runs.

bobpaw avatar Jun 11 '25 16:06 bobpaw

There is a guide on ccache here: https://cristianadam.eu/20200113/speeding-up-c-plus-plus-github-actions-using-ccache/

There might be a storage tradeoff/consideration, especially since our builds tend to be large.

bobpaw avatar Jun 11 '25 16:06 bobpaw

Working on #500 to add ccache, but we should also look at pruning the run matrix. Specifically:

  • Do we need to run clang/gcc for every combination? Or only for debugging to make use of different warnings?
  • Do we need to test C++20 with all configs?

bobpaw avatar Jun 23 '25 17:06 bobpaw

Thank you; much appreciated.

No, we don't need clang/gcc and C++20 for all combinations.

cwsmith avatar Jun 23 '25 17:06 cwsmith

Is this about what we want? Not sure when we do want them.

exclude:
  - compiler: { name: LLVM }
    build_type: Release
  - cxx_standard: 20
    build_type: Release

bobpaw avatar Jun 23 '25 23:06 bobpaw

develop currently has:

      matrix:
        compiler:
          - { name: GNU, CC: gcc-10, CXX: g++-10 }
          - { name: LLVM, CC: clang, CXX: clang++ }
        build_type: [Debug, Release]
        no_mpi: [OFF, ON]
        cxx_standard: [11, 20]
        metis: [OFF, ON]

From these docs: https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/running-variations-of-jobs-in-a-workflow#excluding-matrix-configurations my understanding is that your exclude statement would remove any matches with llvm+release and any Cxx20+release combinations. IIUC, that would bring us from 32 configs to 20. That seems like a good start and with your ccache work I suspect we'll be in good shape. 👍

cwsmith avatar Jun 24 '25 00:06 cwsmith

The current .github/workflows/cmake.yml takes 31minutes to run. https://github.com/SCOREC/core/actions/runs/16089922393/usage This is low enough to not be concerned about at this point.

cwsmith avatar Jul 24 '25 14:07 cwsmith