return-dispatch
return-dispatch copied to clipboard
Exhausted searching IDs in known runs
Hi,
From time to time, we would see this error "Exhausted searching IDs in known runs". See the logs below. I bet this is caused by github API not finding the relevant run. That said, is there anything we can do to mitigate this problem? Thanks!
Mon, 07 Aug 2023 16:14:00 GMT Fetching Workflow ID for smokes-cron.yml...
Mon, 07 Aug 2023 16:14:00 GMT Fetched Workflow ID: 7198060
Mon, 07 Aug 2023 16:14:01 GMT Successfully dispatched workflow:
Mon, 07 Aug 2023 16:14:01 GMT Repository: bay1inc/api-platform
Mon, 07 Aug 2023 16:14:01 GMT Branch: refs/heads/main
Mon, 07 Aug 2023 16:14:01 GMT Workflow ID: smokes-cron.yml
Mon, 07 Aug 2023 16:14:01 GMT Workflow Inputs: {"envs":"staging-test,staging-live"}
Mon, 07 Aug 2023 16:14:01 GMT Distinct ID: 29fc4ecb-c9d5-44d8-9216-aa22fc11ff89
Mon, 07 Aug 2023 16:14:01 GMT Attempt to extract run ID from steps...
Mon, 07 Aug 2023 16:14:03 GMT Exhausted searching IDs in known runs, attempt 1...
Mon, 07 Aug 2023 16:14:11 GMT Exhausted searching IDs in known runs, attempt 2...
Mon, 07 Aug 2023 16:14:18 GMT Exhausted searching IDs in known runs, attempt 3...
Mon, 07 Aug 2023 16:14:26 GMT Exhausted searching IDs in known runs, attempt 4...
Mon, 07 Aug 2023 16:14:33 GMT Exhausted searching IDs in known runs, attempt 5...
Mon, 07 Aug 2023 16:14:41 GMT Exhausted searching IDs in known runs, attempt 6...
Mon, 07 Aug 2023 16:14:48 GMT Exhausted searching IDs in known runs, attempt 7...
Mon, 07 Aug 2023 16:14:56 GMT Exhausted searching IDs in known runs, attempt 8...
Mon, 07 Aug 2023 16:15:03 GMT Exhausted searching IDs in known runs, attempt 9...
Mon, 07 Aug 2023 16:15:11 GMT Exhausted searching IDs in known runs, attempt 10...
Mon, 07 Aug 2023 16:15:18 GMT Exhausted searching IDs in known runs, attempt 11...
Mon, 07 Aug 2023 16:15:26 GMT Exhausted searching IDs in known runs, attempt 12...
Mon, 07 Aug 2023 16:15:33 GMT Exhausted searching IDs in known runs, attempt 13...
Mon, 07 Aug 2023 16:15:41 GMT Exhausted searching IDs in known runs, attempt 14...
Mon, 07 Aug 2023 16:15:48 GMT Exhausted searching IDs in known runs, attempt 15...
Mon, 07 Aug 2023 16:15:55 GMT Exhausted searching IDs in known runs, attempt 16...
Mon, 07 Aug 2023 16:16:04 GMT Exhausted searching IDs in known runs, attempt 17...
Mon, 07 Aug 2023 16:16:12 GMT Exhausted searching IDs in known runs, attempt 18...
Mon, 07 Aug 2023 16:16:20 GMT Exhausted searching IDs in known runs, attempt 19...
Mon, 07 Aug 2023 16:16:27 GMT Exhausted searching IDs in known runs, attempt 20...
Mon, 07 Aug 2023 16:16:35 GMT Exhausted searching IDs in known runs, attempt 21...
Mon, 07 Aug 2023 16:16:42 GMT Exhausted searching IDs in known runs, attempt 22...
Mon, 07 Aug 2023 16:16:55 GMT Exhausted searching IDs in known runs, attempt 23...
Mon, 07 Aug 2023 16:16:57 GMT Exhausted searching IDs in known runs, attempt 24...
Mon, 07 Aug 2023 16:17:05 GMT Exhausted searching IDs in known runs, attempt 25...
Mon, 07 Aug 2023 16:17:12 GMT Exhausted searching IDs in known runs, attempt 26...
Mon, 07 Aug 2023 16:17:20 GMT Exhausted searching IDs in known runs, attempt 27...
Mon, 07 Aug 2023 16:17:27 GMT Exhausted searching IDs in known runs, attempt 28...
Mon, 07 Aug 2023 16:17:35 GMT Exhausted searching IDs in known runs, attempt 29...
Mon, 07 Aug 2023 16:17:42 GMT Exhausted searching IDs in known runs, attempt 30...
Mon, 07 Aug 2023 16:17:50 GMT Exhausted searching IDs in known runs, attempt 31...
Mon, 07 Aug 2023 16:17:57 GMT Exhausted searching IDs in known runs, attempt 32...
Mon, 07 Aug 2023 16:18:05 GMT Exhausted searching IDs in known runs, attempt 33...
Mon, 07 Aug 2023 16:18:13 GMT Exhausted searching IDs in known runs, attempt 34...
Mon, 07 Aug 2023 16:18:20 GMT Exhausted searching IDs in known runs, attempt 35...
Mon, 07 Aug 2023 16:18:28 GMT Exhausted searching IDs in known runs, attempt 36...
Mon, 07 Aug 2023 16:18:35 GMT Exhausted searching IDs in known runs, attempt 37...
Mon, 07 Aug 2023 16:18:43 GMT Exhausted searching IDs in known runs, attempt 38...
Mon, 07 Aug 2023 16:18:50 GMT Exhausted searching IDs in known runs, attempt 39...
Mon, 07 Aug 2023 16:19:05 GMT Exhausted searching IDs in known runs, attempt 40...
Mon, 07 Aug 2023 16:19:07 GMT Exhausted searching IDs in known runs, attempt 41...
Mon, 07 Aug 2023 16:19:12 GMT Error: Failed to complete: Timeout exceeded while attempting to get Run ID
Mon, 07 Aug 2023 16:19:12 GMT Warning: Does the token have the correct permissions?
Mon, 07 Aug 2023 16:19:12 GMT Error: Timeout exceeded while attempting to get Run ID
Apologies, I missed this issue being raised
Yes typically this would indicate that the run fails to start, do you find that the run is starting and this action is failing to get the ID?
Mitigate in which sense? Are you using this action to simply kick off another action and are not concerned for the ID? Or?
What I see often is that a run was successfully triggered (the "Distinct ID" matches the expected ID) but this action couldn't find it after many tries.
Hmmm I wonder if the rework i'm doing for v2 will help. In the interim, can you please post the full log with debug logging enabled?
You can enable this by adding a secret ACTIONS_RUNNER_DEBUG
with true
(info on this here https://docs.github.com/en/actions/monitoring-and-troubleshooting-workflows/enabling-debug-logging)
Good idea. I just added that secret. Will keep you posted. Thanks!
Happy to re-open this should there still be any issues, let me know :)
So this happened today.
It failed to find a workflow run with the matching dispatch id:
However, the workflow run was dispatched successfully (the id matches)
How long is it between the job starting and the action dispatching? do you use custom runners or gh runners?
The action was dispatched at Thu, 14 Sep 2023 04:44:27 GMT
and the job started running at Thu, 14 Sep 2023 04:44:35 GMT
so it took 8 seconds.
We use self-hosted runners for this job. We used to use self-hosted runners and had this problem as well.
Weird, does the output for return-dispatch to catch happen before anything else?
Sorry what you mean exactly?
Ah sorry, didn't explain myself very well, does this step happen before any other steps?
- name: echo distinct ID ${{ github.event.inputs.distinct_id }}
run: echo ${{ github.event.inputs.distinct_id }}
No, that job only has one step like this:
id:
runs-on: [infra, us-central1, build-small]
if: ${{ github.event.inputs.distinct_id != null }}
steps:
- name: Distinct ID ${{ github.event.inputs.distinct_id }}
run: echo ${{ github.event.inputs.distinct_id }}
@jduan-highnote Could it be because your targeted repository triggers a lot of workflows on regular basis? I see that the API call to get the workflow runs only fetches 5 when a branch name is specified as ref
, and only 10 otherwise:
https://github.com/Codex-/return-dispatch/blob/78fb303cfd7699e9c743c2c76479906f0e970b80/src/api.ts#L135-L147
If more than 5 workflows have been triggered at the same time, then your run might never be in the first page of results and therefore impossible to find. It would be great to increase the per_page
value or make it configurable.
@pmrotule that's great insight. It's certainly possible that the target workflow has more than 5 concurrent runs at some point. It'd be great to make this a configurable param! @Codex-
@jduan-highnote I've cut a new release with some enhancements courtesy of @pmrotule
Let me know how it goes (v1 tag has been updated so if you're using the v1 tag you shouldn't have to change anything)
@Codex- can you given a summary of the recent enhancements? We do use v1
across the board so we will pick up the enhancements automatically.
An octokit iterator is now being used to check for runs across multiple API response pages
Nice! I'll keep you posted.
An octokit iterator is now being used to check for runs across multiple API response pages
@Codex- The iterator is only used to get the workflow id within getWorkflowId
, but getWorkflowRunIds
is still only looking at the first 5 runs:
https://github.com/Codex-/return-dispatch/blob/dbbbe1d543b3d42aee5613cd7eb4dea947dc54a5/src/api.ts#L148-L160
So I think that my assumption from last month is still valid:
If more than 5 workflows have been triggered at the same time, then your run might never be in the first page of results and therefore impossible to find.
A nice way to solve this would be to use an iterator within getWorkflowRunIds
as well, but with a limit since going through all workflow runs in the history of a repository could take a very long time. So something like this:
const workflowRunIterator = octokit.paginate.iterator(
octokit.rest.actions.listWorkflowRuns,
{
owner: config.owner,
repo: config.repo,
workflow_id: workflowId,
per_page: 20,
},
);
let runIds: number[] = [];
let page = 1;
const maxPage = 5;
for await (const response of workflowRunIterator) {
if (response.status !== 200) {
throw new Error(
`Failed to get Workflow runs, expected 200 but received ${response.status}`,
);
}
// wrong type definition
const runs: typeof response.data.workflow_runs = response.data;
runIds = runs.map((workflowRun) => workflowRun.id);
if (runIds.length || page === maxPage) {
break;
}
page++;
}
I am also running into this issue:
Successfully dispatched workflow: Repository: *** Branch: *** Workflow ID: ***.yml Distinct ID: 348a1190-04f9-4c90-b5e5-ef3608dd4265 Attempt to extract run ID from steps... Exhausted searching IDs in known runs, attempt 1... Exhausted searching IDs in known runs, attempt 2... Exhausted searching IDs in known runs, attempt 3... Exhausted searching IDs in known runs, attempt 4... Exhausted searching IDs in known runs, attempt 5... Exhausted searching IDs in known runs, attempt 6... Exhausted searching IDs in known runs, attempt 7... Exhausted searching IDs in known runs, attempt 8... Exhausted searching IDs in known runs, attempt 9... Exhausted searching IDs in known runs, attempt 10... Error: getWorkflowRunJobs: An unexpected error has occurred: Server Error Error: Failed to complete: Server Error Warning: Does the token have the correct permissions? Error: Server Error
The weirdest thing is that it will keep attempting to find the ID while the successfully triggered pipeline is running and then hits the Server Error or timeout error once the pipeline is done. If there's anything I can add to help with troubleshooting, please let me know. Thank you for working on this, this functionality should be included in Github Actions by default.
@ChrisPage-AT When the search is being exhausted, if you do an API request yourself, is the run present?
Thank you for working on this, this functionality should be included in Github Actions by default.
Totally agree, really hoping we see it one day
@Codex- Using the api calls listed in your readme, I can find the triggered pipeline with List Workflow Runs
and use the id to see the jobs with List Jobs for a workflow run
. However I am noticing that neither of those calls seem to list the distinct_id that I'm passing in. Are you doing a different call to obtain that?
I started seeing this error recently. You can see that the workflow was dispatched successfully and the ID matches.
Server error? Unusual, can you enable debug logging? https://docs.github.com/en/actions/monitoring-and-troubleshooting-workflows/enabling-debug-logging
Can hopefully get some more info there
Interesting. I wasn't aware of enabling debug logs for runners. Should I enable both runner & step debug logs?
I think step but I'd do both to get a full picture
I enabled debug log a few weeks ago and today the problem happened for the first time. Please see the attached logs.
Thanks for that @jduan-highnote , I'll have a dig when I get a chance. Really odd that the ID being returned is so short, almost looks truncated.
I've deleted your post so as to not reveal anything about your infra
Thank you for being so thoughtful!
Here's another occurrence if it helps:
Fri, 26 Apr 2024 19:25:45 GMT
Run codex-/return-dispatch@v1
Fri, 26 Apr 2024 19:25:45 GMT Fetching Workflow ID for prbox_update_routing.yaml...
Fri, 26 Apr 2024 19:25:45 GMT Fetched Workflow ID: 88382506
Fri, 26 Apr 2024 19:25:46 GMT Successfully dispatched workflow:
Fri, 26 Apr 2024 19:25:46 GMT Repository: bay1inc/devops
Fri, 26 Apr 2024 19:25:46 GMT Branch: main
Fri, 26 Apr 2024 19:25:46 GMT Workflow ID: prbox_update_routing.yaml
Fri, 26 Apr 2024 19:25:46 GMT Workflow Inputs: {"services":"non-pci","tenants":"test"}
Fri, 26 Apr 2024 19:25:46 GMT Distinct ID: f9c198f0-84b8-4943-a7cf-31be5bcec42b
Fri, 26 Apr 2024 19:25:46 GMT Attempt to extract run ID from steps...
Fri, 26 Apr 2024 19:25:46 GMT ##[debug]Attempting to fetch Run IDs for Workflow ID 88382506
Fri, 26 Apr 2024 19:25:46 GMT ##[debug]failed to get branch for ref: main, please raise an issue with this git ref.
Fri, 26 Apr 2024 19:25:46 GMT ##[debug]Fetched Workflow Runs:
Fri, 26 Apr 2024 19:25:46 GMT ##[debug] Repository: bay1inc/devops
Fri, 26 Apr 2024 19:25:46 GMT ##[debug] Branch: undefined
Fri, 26 Apr 2024 19:25:46 GMT ##[debug] Workflow ID: 88382506
Fri, 26 Apr 2024 19:25:46 GMT ##[debug] Runs Fetched: [8852916369,8852872434,8852351822,8852314544,8852302751,8852270907,8851776517,8851726747,8851694157,8851681964]
Fri, 26 Apr 2024 19:25:46 GMT ##[debug]Attempting to get step names for Run IDs: [8852916369,8852872434,8852351822,8852314544,8852302751,8852270907,8851776517,8851726747,8851694157,8851681964]
Fri, 26 Apr 2024 19:25:46 GMT ##[debug]Fetched Workflow Run Job Steps:
Fri, 26 Apr 2024 19:25:46 GMT ##[debug] Repository: bay1inc/devops
Fri, 26 Apr 2024 19:25:46 GMT ##[debug] Workflow Run ID: 8852916369
Fri, 26 Apr 2024 19:25:46 GMT ##[debug] Jobs Fetched: [24312583511,24312583707,24312583931,24312591506,24312619999,24312642954] Steps Fetched: [Set up job,Print Input Values,Clone Devops,Generate Matrix JSON,Post Clone Devops,Complete job,Distinct ID 61897684-539d-4539-b241-54ec6159841e,Set up runner,Setup K8s auth,Deploying Apps,Complete runner]
Fri, 26 Apr 2024 19:25:46 GMT ##[debug]Fetched Workflow Run Job Steps:
Fri, 26 Apr 2024 19:25:46 GMT ##[debug] Repository: bay1inc/devops
Fri, 26 Apr 2024 19:25:46 GMT ##[debug] Workflow Run ID: 8852872434
Fri, 26 Apr 2024 19:25:46 GMT ##[debug] Jobs Fetched: [24312437185,24312437385,24312437629,24312444058,24312464693,24312481469] Steps Fetched: [Set up job,Print Input Values,Clone Devops,Generate Matrix JSON,Post Clone Devops,Complete job,Distinct ID fcc9476c-2cae-43a1-8850-ae2df9728507,Set up runner,Setup K8s auth,Deploying Apps,Complete runner]
Fri, 26 Apr 2024 19:25:47 GMT ##[debug]Fetched Workflow Run Job Steps:
Fri, 26 Apr 2024 19:25:47 GMT ##[debug] Repository: bay1inc/devops
Fri, 26 Apr 2024 19:25:47 GMT ##[debug] Workflow Run ID: 8852351822
Fri, 26 Apr 2024 19:25:47 GMT ##[debug] Jobs Fetched: [24310882803,24310883124,24310883576,24310892691,24310930047,24310957296] Steps Fetched: [Set up job,Print Input Values,Clone Devops,Generate Matrix JSON,Post Clone Devops,Complete job,Distinct ID 0fec8571-ebc2-49f5-9364-747a1b6d3649,Set up runner,Setup K8s auth,Deploying Apps,Complete runner]
Fri, 26 Apr 2024 19:25:47 GMT ##[debug]Fetched Workflow Run Job Steps:
Fri, 26 Apr 2024 19:25:47 GMT ##[debug] Repository: bay1inc/devops
Fri, 26 Apr 2024 19:25:47 GMT ##[debug] Workflow Run ID: 8852314544
Fri, 26 Apr 2024 19:25:47 GMT ##[debug] Jobs Fetched: [24310757841,24310757992,24310758179,24310764698,24310791043,24310814079] Steps Fetched: [Set up job,Print Input Values,Clone Devops,Generate Matrix JSON,Post Clone Devops,Complete job,Distinct ID 6822826a-513e-4f60-8535-fe7573de521e,Set up runner,Setup K8s auth,Deploying Apps,Complete runner]
Fri, 26 Apr 2024 19:25:57 GMT Error: getWorkflowRunJobs: An unexpected error has occurred: Server Error
Fri, 26 Apr 2024 19:25:57 GMT ##[debug]HttpError: Server Error
Fri, 26 Apr 2024 19:25:57 GMT ##[debug] at file:///runner/_work/_actions/codex-/return-dispatch/v1/dist/index.mjs:19832:26
Fri, 26 Apr 2024 19:25:57 GMT ##[debug] at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
Fri, 26 Apr 2024 19:25:57 GMT ##[debug] at async getWorkflowRunJobSteps (file:///runner/_work/_actions/codex-/return-dispatch/v1/dist/index.mjs:23186:22)
Fri, 26 Apr 2024 19:25:57 GMT ##[debug] at async run (file:///runner/_work/_actions/codex-/return-dispatch/v1/dist/index.mjs:23269:25)
Fri, 26 Apr 2024 19:25:57 GMT Error: Failed to complete: Server Error
Fri, 26 Apr 2024 19:25:57 GMT Warning: Does the token have the correct permissions?
Fri, 26 Apr 2024 19:25:57 GMT ##[debug]HttpError: Server Error
Fri, 26 Apr 2024 19:25:57 GMT ##[debug] at file:///runner/_work/_actions/codex-/return-dispatch/v1/dist/index.mjs:19832:26
Fri, 26 Apr 2024 19:25:57 GMT ##[debug] at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
Fri, 26 Apr 2024 19:25:57 GMT ##[debug] at async getWorkflowRunJobSteps (file:///runner/_work/_actions/codex-/return-dispatch/v1/dist/index.mjs:23186:22)
Fri, 26 Apr 2024 19:25:57 GMT ##[debug] at async run (file:///runner/_work/_actions/codex-/return-dispatch/v1/dist/index.mjs:23269:25)
Fri, 26 Apr 2024 19:25:57 GMT Error: Server Error
Fri, 26 Apr 2024 19:25:57 GMT ##[debug]Node Action run completed with exit code 1
Fri, 26 Apr 2024 19:25:57 GMT ##[debug]Finishing: Trigger routing updates
@Codex- do you have enough logs for debugging? I'd like to turn off debug logs for our github actions.