[POC] Implement caching of .build for unit-tests
Motivation:
Faster tests. Less CI time / costs.
Not sure if this is what you would want at all, but if you do, I think this (possibly with minor fixes) should work properly.
Modifications:
Add an option to swift_matrix to do caching of .build.
For now, enable this caching in unit-tests.
Result:
Faster tests. Less CI time / costs.
@FranzBusch WDYT?
We have intentionally not adopted caching yet since it trades potential time saving for cost. I am going to kick-off the CI here since I want to see what the timing impact actually is and see if the trade off in time vs cost is worth it.
@FranzBusch are you referring to GitHub costs? Since our org at work used to use GitHub Actions Runners and caching, and we did have to pay for our usage, my off-hand guess would be it'll actually decrease costs. Caching .build has a noticeable effect on run times, and as far as I can see caching is free for up to 10GB in GitHub (in my experience it actually tolerates and allows for more usage), and exceeding that limit will just delete the oldest cache.
Furthermore, you can use an S3 bucket for caching with runs-on/cache. Like i've mentioned before, I'd even highly recommend moving the runners to RunsOn-managed runners. It'll be a MASSIVE (5x-10x+) cost saver as well as a pretty nice time-cut from the CIs. It'll run everything on your own AWS account, and is pretty easy to set up. To be clear I have no affiliation with RunsOn 😅. The only reason I can rationalize you-all not wanting to use RunsOn, is organizational approvals that you might need to have, and perhaps security checks that RunsOn might need to go through. I think RunsOn has a 1500$ yearly plan which gives you access to their full code, so that might be helpful.
@FranzBusch based on the code i put there, it won't use the cache in a re-run. Not that, that code is mandatory, but i'll just push something to the branch to see how things will go with the current changes.
@MahdiBM Can we make re-runs work. Then I can trigger the CI again to see the impact
@FranzBusch there you go
So it appears that for swift-nio, it decreases the nightly unit-tests runs from 4m40s to 4m20s. A pretty low difference.
Though this difference will get noticeably bigger, the heavier the package / builds are.
I'm not sure if there are such heavy repos in the swiftlang/Apple orgs.
If you're considering maintaining standardized swift reusable workflows for public users, caching will be necessary IMO in such a workflow.
I'll let you decide if you think it's worth it to have the option for caching in the current workflows, or not.
I keep seeing this in the logs
Failed to save: Unable to reserve cache with key unit-tests-swift-nio-build-nightly-6.0-Linux-307731ca0421d5a69a068b62a5d04c4356279abcf0cfd0d71aa719a762bef5a7, another job may be creating this cache. More details: Cache already exists. Scope: refs/pull/2874/merge, Key: unit-tests-swift-nio-build-nightly-6.0-Linux-307731ca0421d5a69a068b62a5d04c4356279abcf0cfd0d71aa719a762bef5a7, Version: 71ee539a69b4092a757e60e07a31650a220dbc09d2d02876da35435f74632221 Warning: Cache save failed.
Any idea @MahdiBM?
So it appears that for swift-nio, it decreases the nightly unit-tests runs from 4m40s to 4m20s. A pretty low difference.
That's kinda what I expected tbh. The time it takes to push and download the cache vs just running a build is too minimal.
If you're considering maintaining standardized swift reusable workflows for public users, caching will be necessary IMO in such a workflow.
These workflows for now are really only intended to be used by library packages in the swiftlang, apple and swift-server organization. So at this time I think we can hold off adding the caching. Though I appreciate you taking the time to create this PR to show the impact of it!
@FranzBusch Without the cache-save step which shouldn't be usually triggered, there will be a 30s more time cut, at which point it might be worth it (50s cut in this repo for nightly debug builds for unit tests). If there is something that uses release builds, caching can have a bigger impact on those as well. I generally think 50s-60s of a time cut is "good" considering the whole CI doesn't take 5 mins usually, but you should still decide if it's worth it, or if you want to just have this anyway and test how it works out.
I figured out why the cache-save is hit at all, when it shouldn't be.
Before the restore step, we haven't yet checked-out the package.
So ${{ hashFiles('./Package.resolved') }} resolves to just an empty string.
Then the restore-cache step will return cache-hit=false.
And that's why the cache-save step is triggered when it shouldn't (based on the steps.restore-cache.outputs.cache-hit != 'true' condition on the cache-save step.)
Now, we can check out the package and everything will be fine, but the problem is this repository does not commit the Package.resolved, which means hashFiles('./Package.resolved') will still resolve to an empty string, and we won't have a proper way to know when we should refresh the .build cache. Instead, we can use the base branch's commit sha like I mentioned in a comment in the code.
@FranzBusch TLDR; The cache-save step should no longer be triggered now, if the cache save is going to fail anyways. You can give it another try if you want.