rippled icon indicating copy to clipboard operation
rippled copied to clipboard

ci: Use ccache to cache build objects for speeding up building

Open bthomee opened this issue 1 month ago • 6 comments

High Level Overview of Change

This change enables caching of build objects using ccache on Linux and macOS.

Context of Change

Right now, each pipeline invocation builds the source code from scratch. Although compiled Conan dependencies are cached in a remote server, the source build objects are not. We are able to further speed up our builds by leveraging ccache.

For Linux the GitHub cache is used to persist them between commits (including between PRs); note that the GitHub cache is limited to 10GB, with oldest entries evicted first, a limit that we might run into. The macOS pipelines are executed on bare metal with a ccache directory located outside of the working directory, and using the GitHub cache is therefore not needed. As we use the Visual Studio compiler on Windows, we cannot use ccache right away; however, I found some relevant GitHub issues that I need to inspect more closely to see if we can make it work anyway.

Type of Change

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [ ] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [ ] Refactor (non-breaking change that only restructures code)
  • [ ] Performance (increase or change in throughput and/or latency)
  • [ ] Tests (you added tests for code that already exists, or your new feature included in this PR)
  • [ ] Documentation update
  • [X] Chore (no impact to binary, e.g. .gitignore, formatting, dropping support for older tooling)
  • [ ] Release

bthomee avatar Dec 03 '25 22:12 bthomee

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 79.1%. Comparing base (f059f0b) to head (a5e8eb0).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##           develop   #6104     +/-   ##
=========================================
- Coverage     79.1%   79.1%   -0.0%     
=========================================
  Files          836     836             
  Lines        71245   71245             
  Branches      8324    8320      -4     
=========================================
- Hits         56360   56353      -7     
- Misses       14885   14892      +7     

see 4 files with indirect coverage changes

Impacted file tree graph

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Dec 03 '25 23:12 codecov[bot]

The macOS pipelines are executed on bare metal with a ccache directory located outside of the working directory

That's not true that ccache dir is outside of the working directory ccache can and should be used on macOS (and we do it for Clio)

mathbunnyru avatar Dec 08 '25 13:12 mathbunnyru

Also, please take a look at Clio's implementation; many of the features can be implemented here as well:

  • https://github.com/XRPLF/clio/blob/develop/.github/workflows/reusable-build.yml
  • https://github.com/XRPLF/clio/blob/develop/cmake/Ccache.cmake (this is better to keep as a separate file, to make our approach more modular, and keep it in sync with Clio)

mathbunnyru avatar Dec 08 '25 14:12 mathbunnyru

The macOS pipelines are executed on bare metal with a ccache directory located outside of the working directory

That's not true that ccache dir is outside of the working directory ccache can and should be used on macOS (and we do it for Clio)

Logging into our macOS GitHub runner shows that the ccache dir is the user's Library/Caches/ccache directory, which is outside of the working directory.

bthomee avatar Dec 08 '25 19:12 bthomee

Also, please take a look at Clio's implementation; many of the features can be implemented here as well:

  • https://github.com/XRPLF/clio/blob/develop/.github/workflows/reusable-build.yml
  • https://github.com/XRPLF/clio/blob/develop/cmake/Ccache.cmake (this is better to keep as a separate file, to make our approach more modular, and keep it in sync with Clio)

I reluctantly moved the ccache CMake changes back into a separate file, simply for similarity with Clio. However, I don't see in what way this helps keep things more modular, since in my view it just increases the number of files in the directory (albeit by just 1) and it moves 5 lines of compiler-related statements into a different file than where all the compiler-related definitions are stored.

I further looked at the ccache set up in Clio and noticed that upload+download are always enabled together, and generally not enabled for pushes into the develop and release branches or the nightly run; I simplified this by just only enabling ccache for PR commits. This also avoids me having to pass variables between multiple levels of actions.

Note that I use the config_name as the cache key. Since you distinguish between builds with coverage enabled, I also added it to the config name so it becomes a separate entry in the cache.

Also note that sanitizers are not yet enabled, but from the sanitizers PR I recall that they will also be part of the config name. However, @mathbunnyru, I noticed you completely disabled ccache for sanitizers - what's the motivation for doing so?

Finally, what's the motivation for not using ccache for commits into the develop or release branches? It is just to be extra safe or is there an actual known risk with ccache not producing the correct build objects?

bthomee avatar Dec 08 '25 19:12 bthomee

The macOS pipelines are executed on bare metal with a ccache directory located outside of the working directory

That's not true that ccache dir is outside of the working directory ccache can and should be used on macOS (and we do it for Clio)

Logging into our macOS GitHub runner shows that the ccache dir is the user's Library/Caches/ccache directory, which is outside of the working directory.

Alright, I see now that the prepare-runner action overrides this to be inside the working directory.

bthomee avatar Dec 09 '25 12:12 bthomee

@mathbunnyru @vlntb This PR is now ready, and ccache now works on all platforms.

bthomee avatar Dec 11 '25 05:12 bthomee

This PR is now ready for re-review.

Noteworthy changes and observations:

  • The behavior of the GitHub cache is not what I was looking for, and I explored other storage backends.
  • Ccache supports Redis as remote storage backend, which I deployed, but as it turns out AWS Elasticache requires the use of TLS, which ccache does not support. It is possible to set up a proxy that can upgrade a plain connection to a secured connection, but that would require deploying a separate EC2 instance in which case I may as well use another backend.
  • Ccache also supports HTTP as remote storage backend, and suggests either using Nginx or Bazel Remote. Since I am familiar with Bazel and remote caching, and still hold out hope we can switch to Bazel some day, I deployed the latter as backend on a new EC2 instance.
  • This worked fine for the Windows runner, which is deployed in the same VPC as the new EC2 instance. However, for the macOS and Linux runners this required creating a VPC peering and updating the routing table to allow these runners to connect to the cache. This has now been fully enabled.
  • Since CMake is configured to move our header files into different directories (from include/ to somewhere in the build/ directory), I had to tell it to ignore the creation and modification date. Subsequent runs of the same workflow now result in 100% cache hits, whereas before that was 0% or a low number.

@mathbunnyru I can probably move some of the env vars to the prepare-runner action. In particular, if Clio would like to use the same remote backend, you are more than welcome to. I completely disabled local caching here, since there's no point when the workspace gets cleared between jobs and I'm not using the GitHub cache; this also means I don't need to clear it, since that only affects the local cache. Not having a local cache doesn't seem to meaningfully affect build times (maybe build is slightly longer, but save time by not needing to restore and save a cache anymore).

bthomee avatar Dec 16 '25 21:12 bthomee