Make the cache more Git aware
Today, we have the following cache entry resolution:
- An exact match on OS, job id, workflow name, matrix, and Git SHA
- The most recent entry saved for the same OS, job id, workflow name, and matrix values
- The most recent entry saved for the same OS and job id
- The most recent entry saved for the same OS
It would be useful to include an additional SHA-based match case between cases 1 and 2. This would match based on the most recent commit in the PR that had a cache entry.
Scenario: Three commits A, B, and C exist on main. Cache entries exist for each commit.
C (HEAD) - B - A
Now, consider a PR that has only merged commits up B. It will miss the cache case 1 (exact SHA) and fall through to case 2 (job + matrix). This would likely load the cache produced by C which could be sub-optimal for the PR build. If setup-gradle could load the cache from commit B instead, it could lead to more FROM-CACHE hits in the PR build.
As a human observer its easy to see commit B would be a better cache to load since we can see commit B in the PR. Similarly, we can see that C would be a bad cache to load since commit C is not in the PR. I'm sure automating this is not so easy.
This might be challenging due to the GH cache API, but it would be good to explore this.