codecov-action
codecov-action copied to clipboard
Inscrutable "Actions workflow run is stale" error
I'm getting a lot of sporadic failures in reporting, possibly due to the number of parallel builds that are attempting to submit coverage reports.
The way my project is configured is to build the core tests, which takes about four minutes, then builds over twenty other integration tests, each of which takes five or more minutes. It seems as though we may be dancing right on the edge of some sort of limit, possibly due to my naive understanding of the after_n_builds option.
Unfortunately, Googling anything about {'detail': ErrorDetail(string='Actions workflow run is stale', code='not_found')} turns up nothing, so hopefully after now, at least people will find this issue.
Would you kindly explain how to either increase the timeout for when codecov is waiting for coverage segments, or if this is not the case, instruct on how to resolve this error?
Thank you for your assistance, as well as for an excellent product!
==> Uploading reports
url: https://codecov.io
query: branch=update%2Fjackson-core-2.12.1&commit=5e16535e81483a6a07612ba10cfe32c328469103&build=598338763&build_url=http%3A%2F%2Fgithub.com%2Ftwilio%2Fguardrail%2Factions%2Fruns%2F598338763&name=&tag=&slug=twilio%2Fguardrail&service=github-actions&flags=&pr=927&job=CI&cmd_args=n,F,Q,Z,f
-> Pinging Codecov
https://codecov.io/upload/v4?package=github-action-20210129-7c25fce&token=secret&branch=update%2Fjackson-core-2.12.1&commit=5e16535e81483a6a07612ba10cfe32c328469103&build=598338763&build_url=http%3A%2F%2Fgithub.com%2Ftwilio%2Fguardrail%2Factions%2Fruns%2F598338763&name=&tag=&slug=twilio%2Fguardrail&service=github-actions&flags=&pr=927&job=CI&cmd_args=n,F,Q,Z,f
{'detail': ErrorDetail(string='Actions workflow run is stale', code='not_found')}
404
==> Uploading to Codecov
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 182k 100 81 100 182k 400 904k --:--:-- --:--:-- --:--:-- 904k
{'detail': ErrorDetail(string='Actions workflow run is stale', code='not_found')}
Error: Codecov failed with the following error: The process '/usr/bin/bash' failed with exit code 1
Hi @blast-hardcheese, we are working to understand the issue here, but I think for now as a workaround, you can supply the Codecov upload token. Do you have a GitHub Actions CI link that we can take a look at btw?
Do you have a GitHub Actions CI link that we can take a look at btw?
Sure -- you can take a look at many of the recent failures on https://github.com/guardrail-dev/guardrail/ , one example is https://github.com/guardrail-dev/guardrail/pull/1000/checks?check_run_id=1976440163 .
I've just been re-running all the checks and usually the subsequent run is successful.
Additionally, I've moved this repo out from where it was previously hosted, https://github.com/twilio/guardrail/ , within the past 24 hours -- that may impact your investigation. If you need more samples from after the repo was moved over, I can submit them as they come in -- library upgrade PRs are the most likely to trigger this, due to the rate of submission.
More recent example after moving the repo to a new org and re-authorizing: https://github.com/guardrail-dev/guardrail/pull/1004/checks?sha=ff99a5dfa20d69e2f8519ca7d6569f5a6ebb63a8
@blast-hardcheese, unless I'm missing something, I couldn't find the above error in that latest link. Apologies if it's really blatant and I missed it, but would you mind sharing the name of the job that failed?
@thomasrockhu Ack! I didn't realize that re-running the workflow erased the failure, I thought links were stable.
I was able to reproduce the error on an already merged PR, so this should not change:
https://github.com/guardrail-dev/guardrail/pull/1004/checks?check_run_id=2009728188
Sorry about that!
I don't know if this is related, but if this is a race condition, it very well may be -- we're also experiencing the exact opposite problem, where we successfully report all after_n_builds runs (22 runs) asynchronously to codecov.io for a PR, but the callback never fires, so we never get a response to the required codecov build phase.
A normal run looks like this:

in this example, it was just hung like this (I've since merged the PR, but you can still see that Codecov is not in the reported checks for that PR, meaning the callback didn't fire):

@blast-hardcheese, I think I resolved most of the Actions workflow is stale. Let me know if that's not the case
As for the most recent example, it didn't fire because we had only received 16 builds (and not 22). It's a little challenging to see which build didn't upload properly, do you happen to know the names of the jobs?
In that particular example, it looks like some/all of the Scala 15 builds didn't run tests or try to upload to Codecov
Hi! Let me know if I should open a new issue for this, but we're having an identical problem. We're planning on reducing the size of our testing matrix in the near future, will this alleviate the problem? Otherwise if you could take a look that'd be great! Thanks :)
In that particular example, it looks like some/all of the
Scala 15builds didn't run tests or try to upload to Codecov
You're completely correct. I didn't realize that I had excluded some coverage uploads while also using after_n_builds -- sorry for confusing the issue here.
I haven't seen the Actions workflow run is stale error for more than a week at this point, so may I ask what you did on your end? Is this something I could have done via the codecov UI somehow, and is there a possibility of this resurfacing? I've noticed some other 👍s on the initial issue, so presumably others are running into this as well
(Also, thank you again for all your help here!)
FWIW I have been running into this as recently as yesterday in my project too - https://github.com/laurynas-biveinis/unodb/runs/2109776375?check_suite_focus=true
In my case there are two flag-separated configurations, which get uploaded in parallel. Perhaps they should be serialized?
@laurynas-biveinis I'm looking into making a patch for this. We should hopefully have that particular edge case fixed this week.
I was having this problem and I found adding the Codecov token as a GitHub Actions secret helped. However, I'm now getting this error on every merge to my main branch, after the jobs for the same commit on its feature branch (pre-merge) succeeds.
Here's my log of the failure: https://github.com/briansmith/ring/runs/2172862556?check_suite_focus=true
I was having this problem and I found adding the Codecov token as a GitHub Actions secret helped.
Unfortunately for any github organization with a wider community this imposes a potential leakage of an access token, hence we at Nextcloud dropped our codecov tokens from the action because the readme says those are not required for public repositories.
Our current mitigation is to report coverage only for a few CI runs, though that can potentially lower the reported coverage as some paths are only triggered by certain tests in our matrix.
I was having this problem and I found adding the Codecov token as a GitHub Actions secret helped. However, I'm now getting this error on every merge to my main branch, after the jobs for the same commit on its feature branch (pre-merge) succeeds.
I was mistaken. Although I did start the process of adding a Codecov token as a secret within my GitHub Actions workflow, I never got around to hooking it up to my use of this action, so it was never used. Thus it had no effect. It seems like Codecov must have addressed the issue here on its end.
In issue #300 I suggest a different solution that doesn't require using a Codecov access token: Move the uploading of coverage from the jobs that collect the coverage. If you have only one job that submits coverage data to codecov then you can avoid the timeout issue described above, AFAICT, and you can also properly minimize permissions on the GitHub token. You'd need to upload the coverage data as an artifact in each job that collects coverage information, and then download those artifacts in the job that submits the coverage information, and then use "needs:" to tell GitHub Actions about the dependency between the jobs.
Closing as this no longer seems to be an issue.