circt icon indicating copy to clipboard operation
circt copied to clipboard

[Nightly CI] CCache hit rates have become <1%

Open teqdruid opened this issue 3 years ago • 4 comments

The nightly ccache hit rates used to be -- on average -- pretty good, but they've since dropped to <1%. This has led to every nightly build being 1 to 2 hours.

teqdruid avatar Aug 22 '22 20:08 teqdruid

Any idea when this was better? Looking through action logs it seems they've taken 1-2 hours for few months at least.

Anyway, looks like we're getting cache "misses" when restoring the various ccache data. I think it's likely they're being evicted regularly, here's the current cache state (gh-action-cache list --limit 100, doesn't seem to be a way to access w/o a tool/REST API):

Total caches size 11.32 GB

Showing 37 of 37 cache entries in llvm/circt

gcc-Release-OFF-OFF-135ca3edfa80b384b9bff70e6ea9193a654c8cae                                               169.48 MB  refs/heads/main          3 minutes ago
clang-Release-ON-OFF-135ca3edfa80b384b9bff70e6ea9193a654c8cae                                              78.11 MB   refs/heads/main          5 minutes ago
ccache-2022-08-26T13:28:24.241Z                                                                            98.47 MB   refs/heads/main          6 minutes ago
ccache-2022-08-26T09:09:49.993Z                                                                            98.45 MB   refs/heads/main          8 minutes ago
Linux-llvm-fe0f72d5c55a9b95c5564089e946e8f08112e995-55ca506cba6bd9ba7c19cd1fe0021dc5-gcc                   4.88 GB    refs/heads/main          8 minutes ago
Linux-llvm-fe0f72d5c55a9b95c5564089e946e8f08112e995-55ca506cba6bd9ba7c19cd1fe0021dc5-clang                 241.38 MB  refs/heads/main          8 minutes ago
Windows-llvm-relobj-fe0f72d5c55a9b95c5564089e946e8f08112e995                                               1.15 GB    refs/heads/main          8 minutes ago
gcc-Release-OFF-ON-135ca3edfa80b384b9bff70e6ea9193a654c8cae                                                199.21 MB  refs/heads/main          10 minutes ago
ccache-2022-08-26T13:22:20.098Z                                                                            99.26 MB   refs/pull/3781/merge     12 minutes ago
gcc-Release-ON-ON-135ca3edfa80b384b9bff70e6ea9193a654c8cae                                                 199.28 MB  refs/heads/main          12 minutes ago
ccache-2022-08-26T13:21:44.952Z                                                                            99.21 MB   refs/pull/3781/merge     12 minutes ago
ccache-2022-08-26T13:18:37.324Z                                                                            99.21 MB   refs/pull/3781/merge     16 minutes ago
clang-Debug-OFF-ON-135ca3edfa80b384b9bff70e6ea9193a654c8cae                                                235.05 MB  refs/heads/main          20 minutes ago
clang-Release-ON-ON-135ca3edfa80b384b9bff70e6ea9193a654c8cae                                               93.92 MB   refs/heads/main          24 minutes ago
clang-Release-OFF-ON-135ca3edfa80b384b9bff70e6ea9193a654c8cae                                              93.94 MB   refs/heads/main          28 minutes ago
clang-Release-OFF-OFF-135ca3edfa80b384b9bff70e6ea9193a654c8cae                                             78.12 MB   refs/heads/main          28 minutes ago
clang-Debug-ON-ON-135ca3edfa80b384b9bff70e6ea9193a654c8cae                                                 233.75 MB  refs/heads/main          29 minutes ago
clang-Debug-OFF-OFF-135ca3edfa80b384b9bff70e6ea9193a654c8cae                                               242.52 MB  refs/heads/main          32 minutes ago
clang-Debug-ON-OFF-135ca3edfa80b384b9bff70e6ea9193a654c8cae                                                242.37 MB  refs/heads/main          33 minutes ago
ccache-2022-08-26T07:47:29.700Z                                                                            98.22 MB   refs/heads/main          4 hours ago
ccache-2022-08-25T23:30:43.687Z                                                                            98.21 MB   refs/heads/main          6 hours ago
ccache-2022-08-26T07:01:24.806Z                                                                            98.45 MB   refs/pull/3780/merge     7 hours ago
ccache-2022-08-25T18:03:14.765Z                                                                            98.16 MB   refs/heads/main          14 hours ago
ccache-2022-08-25T23:06:56.008Z                                                                            97.91 MB   refs/pull/3765/merge     14 hours ago
ccache-2022-08-25T00:47:11.782Z                                                                            96.19 MB   refs/pull/3765/merge     15 hours ago
ccache-2022-08-25T22:12:34.486Z                                                                            98.53 MB   refs/pull/3779/merge     15 hours ago
ccache-2022-08-25T20:27:51.313Z                                                                            98.47 MB   refs/pull/3779/merge     15 hours ago
ccache-2022-08-25T15:08:33.008Z                                                                            96.62 MB   refs/heads/main          20 hours ago
ccache-2022-08-25T17:40:06.141Z                                                                            98.07 MB   refs/pull/3742/merge     20 hours ago
ccache-2022-08-25T16:07:34.412Z                                                                            97.94 MB   refs/pull/3742/merge     20 hours ago
macos-11-llvm-fe0f72d5c55a9b95c5564089e946e8f08112e995-b72a1f4828f4cbefeb1c0be0f9bc9fb3e701500c-clang      414.13 MB  refs/tags/sifive/1/14/0  21 hours ago
ubuntu-20.04-llvm-fe0f72d5c55a9b95c5564089e946e8f08112e995-b72a1f4828f4cbefeb1c0be0f9bc9fb3e701500c-clang  464.00 MB  refs/tags/sifive/1/14/0  21 hours ago
ccache-2022-08-25T13:05:32.498Z                                                                            97.65 MB   refs/pull/3742/merge     21 hours ago
ubuntu-18.04-llvm-fe0f72d5c55a9b95c5564089e946e8f08112e995-b72a1f4828f4cbefeb1c0be0f9bc9fb3e701500c-clang  470.70 MB  refs/tags/sifive/1/14/0  a day ago
ccache-2022-08-25T14:44:50.345Z                                                                            96.50 MB   refs/heads/main          a day ago
ccache-2022-08-25T14:42:43.785Z                                                                            98.83 MB   refs/pull/3766/merge     a day ago
ccache-2022-08-25T00:03:09.605Z                                                                            96.46 MB   refs/heads/main          a day ago

There are some rather large entries, maybe we can trim them down? The cache limit is 10GB (although apparently not immediately/strictly enforced). The ccache- entries might not be ideal with the date in the key string (shortIntegrationTests.yml), since I don't think we'll ever use anything but the most recent. :thinking:

dtzSiFive avatar Aug 26 '22 13:08 dtzSiFive

Looking at the latest run all ccache caches were restored (!), causing 8/16 configurations to complete in about or less than 5 minutes. These were the Release configurations, and showed 34-50% hit rates. For the ones that still took ~2hours, despite the cache being found they exhibit poor hit rates (< 1%).

Weekends are slow, so we may not be out of the woods yet re:ensuring nightly caches are available but at least they sometimes are now :smile:-- bouncing through past weekends I don't see a time the cache was found in recent history.

Looks like maybe something re:debug build is interfering/causing all caches entries to miss? Having the cache available (limited to 300MB) has negligible impact on hit rate-- in at least some cases no restored cache at all has the same hit rate as finding it successfully, for example: https://github.com/llvm/circt/runs/8050184803?check_suite_focus=true <-- no cache found, 0.87% https://github.com/llvm/circt/runs/8057440984?check_suite_focus=true <-- cache found, 0.87%

dtzSiFive avatar Aug 28 '22 21:08 dtzSiFive

IIRC, this is mostly consistent with what @mikeurbach and I were seeing when we initially set up the ccache. We think it implies that the wrong ccache is being restored, but didn't look into it further.

teqdruid avatar Aug 28 '22 22:08 teqdruid

Yes this is what we saw before: cache hits on release but not debug, and the release half of the build matrix completing fast while the debug half takes a whole rebuild because there are no hits. I spent a good deal of time reading and debugging the cache github action, and I didn't see any bug or issue where the wrong cache was restored. I think more likely, the large debug artifacts are overflowing the cache and making it essentially useless. Just looking at a local build, the debug mlir-opt on macos is 417MB, and lots of libraries are in the 5-10MB range.

mikeurbach avatar Aug 29 '22 17:08 mikeurbach