Retry fetching repository on timeout
Problem to solve
Sometimes this action takes a long time to fetch a repository. Usually it is fast but sometimes very slow. Here is a log:
Fetching the repository
/usr/bin/git -c protocol.version=2 fetch --no-tags --prune --progress --no-recurse-submodules --depth=1 origin +94d57b271584176778fd61719b2bef4b14efb94b:refs/remotes/origin/develop
remote: Enumerating objects: 27161, done.
Receiving objects: 0% (1/27161)
...
Receiving objects: 0% (65/27161), 396.00 KiB | 354.00 KiB/s
...
Receiving objects: 17% (4880/27161), 66.50 MiB | 109.00 KiB/s
Receiving objects: 17% (4880/27161), 66.56 MiB | 46.00 KiB/s
Receiving objects: 17% (4880/27161), 66.58 MiB | 41.00 KiB/s
Receiving objects: 17% (4880/27161), 66.61 MiB | 37.00 KiB/s
...
Error: The operation was canceled.
I asked GitHub support but could not yet resolve the root cause. This seems happened on self-hosted runner only.
Solution
It would be nice to retry fetch on timeout. For example, if git fetch is running over 60 seconds (set in inputs), the action will terminate and retry it.
And we have 2024, 28 thumbs up, and no response...
checkout's main dependency for most users is fetching from Github's git backend itself.
The reality is, sometimes Github's git infra is flaky and/or slow or fetch takes 5 minutes when it should take <20s. This is pretty much never acknowledged on Github's status page but it slows our CI pipeline to a crawl. Yesterday for a time, actions/checkout was taking 8 minutes+ on ~5% of our jobs.
What a great revenue generator for Github Actions to burn worker time on the fetch (if this happens for people not using self-hosted runners).
It's pure hubris that this is not supported - "we don't need retry because our git backend is perfect 😇." Basically it implies confidence that the git backend is never slow or flaky. This clearly does not match reality. Either Github's git backend has to have so many 9s of reliability that it's not noticeable, or a retry feature needs to be introduced to actions/checkout.