[VL] CI: Enable GHA dependency cache on static Velox build
To speed up CI static build if velox and vcpcg's code not gets changed.
Ref: https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows
Dynamic build is not impacted by this patch so the overall Velox CI duration would not change.
Some links to inspect the cache:
https://api.github.com/repos/apache/incubator-gluten/actions/caches https://api.github.com/repos/apache/incubator-gluten/actions/cache/usage
Run Gluten Clickhouse CI
Run Gluten Clickhouse CI
Run Gluten Clickhouse CI
Run Gluten Clickhouse CI
Run Gluten Clickhouse CI
Run Gluten Clickhouse CI
Run Gluten Clickhouse CI
Run Gluten Clickhouse CI
@zhouyuan
I can think of a problem that when we update the Velox branch for some reason without changing Gluten's code, the cache should be manually invalidated otherwise will still be restored.
Delete cache: https://docs.github.com/en/rest/actions/cache?apiVersion=2022-11-28#delete-github-actions-caches-for-a-repository-using-a-cache-key
But the way doesn't seem to be so friendly to developer thus we may better to create a new Velox branch if appending changes.
I can think of a problem that when we update the Velox branch for some reason without changing Gluten's code, the cache should be manually invalidated otherwise will still be restored.
Delete cache: https://docs.github.com/en/rest/actions/cache?apiVersion=2022-11-28#delete-github-actions-caches-for-a-repository-using-a-cache-key
But the way doesn't seem to be so friendly to developer thus we may better to create a new Velox branch if appending changes.
Yes, this will happen when we do a rebase in velox, then find some Spark UT failed in gluten, then did some fixes. can we filter on the pull request title, like if there's a key word(like forcebuildvelox)?
@zhouyuan A way to trigger rebuilding is needed anyway. I will raise a new PR for that. Thanks for the suggestion.
===== Performance report for TPCH SF2000 with Velox backend, for reference only ====
| query | log/native_5145_time.csv | log/native_master_03_31_2024_0bfee3a98_time.csv | difference | percentage |
| q1 | 35.61 | 38.72 | 3.104 | 108.72% |
| q2 | 26.25 | 23.80 | -2.449 | 90.67% |
| q3 | 37.11 | 37.26 | 0.151 | 100.41% |
| q4 | 40.53 | 38.23 | -2.301 | 94.32% |
| q5 | 70.18 | 69.53 | -0.653 | 99.07% |
| q6 | 7.44 | 7.39 | -0.050 | 99.32% |
| q7 | 84.96 | 86.10 | 1.144 | 101.35% |
| q8 | 85.25 | 85.99 | 0.736 | 100.86% |
| q9 | 121.04 | 123.88 | 2.840 | 102.35% |
| q10 | 43.86 | 44.86 | 1.006 | 102.29% |
| q11 | 20.35 | 20.75 | 0.405 | 101.99% |
| q12 | 26.14 | 28.42 | 2.272 | 108.69% |
| q13 | 46.05 | 46.88 | 0.826 | 101.79% |
| q14 | 18.78 | 19.81 | 1.029 | 105.48% |
| q15 | 30.01 | 30.54 | 0.530 | 101.77% |
| q16 | 13.37 | 14.13 | 0.758 | 105.67% |
| q17 | 100.84 | 102.65 | 1.813 | 101.80% |
| q18 | 143.86 | 142.95 | -0.912 | 99.37% |
| q19 | 13.70 | 13.59 | -0.114 | 99.17% |
| q20 | 27.06 | 28.84 | 1.777 | 106.57% |
| q21 | 228.93 | 225.54 | -3.387 | 98.52% |
| q22 | 14.16 | 16.63 | 2.466 | 117.41% |
| total | 1235.47 | 1246.46 | 10.991 | 100.89% |
Update: This cache doesn't seem to be able to share among PRs. https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#restrictions-for-accessing-a-cache.
It's OK for now and may need further improvements though.