Self-hosted workers fail immediately, get marked "offline" in the runners list.
I am frankly not sure if this is the issue on CML side, but let me describe it.
CML versions tested: 0.11.0 and 0.17.0 Cloud provider: AWS
Remark: the very same workflow worked when I last used it (3 months ago)
-
Deploy self-hosted runner:
[...] cml runner \ --cloud=aws \ --cloud-region=eu-west-1 \ --cloud-type=g3s.xlarge \ --cloud-spot \ --single \ --cloud-startup-script=$(echo 'echo "$(curl https://github.com/${{ github.actor }}.keys)" >> /home/ubuntu/.ssh/authorized_keys' | base64 -w 0) \ --labels=debug [...]this deployment job finishes successfully, but when it finishes the instance (as checked in AWS console) has not yet performed status checks (this was not the case when the workflow worked 3 last time) / is still in the
Initialisationstage. -
The next job (which runs on self-hosted runner) gets closed basically immediately (in 4s):
The runner has received a shutdown signalalthough the instance itself is not getting cancelled: it goes through AWS status checks and remains running (to clarify: instance deployed assingle),
One more thing: if I deploy the worker as reusable it will be marked as offline in the list of workers after the job fails and will not be accessible…
I deployed the reusable instance and got logs after failure:
ubuntu@ip-172-31-32-70:~$ journalctl -u cml.service -f
-- Logs begin at Thu 2022-07-21 01:23:30 UTC. --
Jul 22 11:36:49 ip-172-31-32-70 cml.sh[2440]: {"level":"info","message":"Outputs: 0"}
Jul 22 11:36:49 ip-172-31-32-70 cml.sh[2440]: {"level":"info","message":"Connected to acpid service."}
Jul 22 11:37:18 ip-172-31-32-70 cml.sh[2440]: {"date":"2022-07-22T11:37:18.362Z","level":"info","message":"runner status","repo":"https://github.com/xxxx/yyyy","status":"ready"}
Jul 22 11:37:30 ip-172-31-32-70 cml.sh[2440]: {"date":"Fri Jul 22 2022 11:37:30 GMT+0000 (Coordinated Universal Time)","error":{"name":"HttpError","request":{"headers":{"accept":"application/vnd.github.v3+json","authorization":"token [REDACTED]","user-agent":"octokit-rest.js/18.0.0 octokit-core.js/3.6.0 Node.js/16.16.0 (linux; x64)"},"method":"GET","request":{"agent":{}},"url":"https://api.github.com/repos/xxxx/yyyy/actions/runs?status=queued"},"response":{"data":{"documentation_url":"https://docs.github.com/rest/reference/actions#list-workflow-runs-for-a-repository","message":"Resource not accessible by integration"},"headers":{"access-control-allow-origin":"*","access-control-expose-headers":"ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Used, X-RateLimit-Resource, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type, X-GitHub-SSO, X-GitHub-Request-Id, Deprecation, Sunset","connection":"close","content-encoding":"gzip","content-security-policy":"default-src 'none'","content-type":"application/json; charset=utf-8","date":"Fri, 22 Jul 2022 11:37:30 GMT","referrer-policy":"origin-when-cross-origin, strict-origin-when-cross-origin","server":"GitHub.com","strict-transport-security":"max-age=31536000; includeSubdomains; preload","transfer-encoding":"chunked","vary":"Accept-Encoding, Accept, X-Requested-With","x-content-type-options":"nosniff","x-frame-options":"deny","x-github-media-type":"github.v3; format=json","x-github-request-id":"ABF8:0EFF:39DD5D:40D374:62DA8BFA","x-ratelimit-limit":"5000","x-ratelimit-remaining":"4975","x-ratelimit-reset":"1658491535","x-ratelimit-resource":"core","x-ratelimit-used":"25","x-xss-protection":"0"},"status":403,"url":"https://api.github.com/repos/xxxx/yyyy/actions/runs?status=queued"},"status":403},"exception":true,"level":"error","message":"unhandledRejection: Resource not accessible by integration\nHttpError: Resource not accessible by integration\n at /snapshot/cml/node_modules/@octokit/request/dist-node/index.js:86:21\n at runMicrotasks (<anonymous>)\n at processTicksAndRejections (node:internal/process/task_queues:96:5)\n at async Job.doExecute (/snapshot/cml/node_modules/bottleneck/light.js:405:18)","os":{"loadavg":[1.05,0.6,0.24],"uptime":146.55},"process":{"argv":["/usr/bin/cml-internal","/snapshot/cml/bin/cml.js","runner","--name","cml-4l6sv1qiu1","--labels","debug","--idle-timeout","300","--driver","github","--repo","https://github.com/xxxx/yyyy","--token","ghs_wDbiPdDx3S0wjEm4hvt0v0v0v037P54OliM1","--tf-resource","eyJtb2RlIjoibWFuYWdlZCIsInR5cGUiOiJpdGVyYXRpdmVfY21sX3J1bm5lciIsIm5hbWUiOiJydW5uZXIiLCJwcm92aWRlciI6InByb3ZpZGVyW1wicmVnaXN0cnkudGVycmFmb3JtLmlvL2l0ZXJhdGl2ZS9pdGVyYXRpdmVcIl0iLCJpbnN0YW5jZXMiOlt7InByaXZhdGUiOiIiLCJzY2hlbWFfdmVyc2lvbiI6MCwiYXR0cmlidXRlcyI6eyJuYW1lIjoiY21sLTRsNnN2MXFpdTEiLCJsYWJlbHMiOiIiLCJpZGxlX3RpbWVvdXQiOjMwMCwicmVwbyI6IiIsInRva2VuIjoiIiwiZHJpdmVyIjoiIiwiY2xvdWQiOiJhd3MiLCJjdXN0b21fZGF0YSI6IiIsImlkIjoiaXRlcmF0aXZlLTJvNzh2ZXFjOHJrZ2kiLCJpbWFnZSI6IiIsImluc3RhbmNlX2dwdSI6IiIsImluc3RhbmNlX2hkZF9zaXplIjozNSwiaW5zdGFuY2VfaXAiOiIiLCJpbnN0YW5jZV9sYXVuY2hfdGltZSI6IiIsImluc3RhbmNlX3R5cGUiOiIiLCJyZWdpb24iOiJldS13ZXN0LTEiLCJzc2hfbmFtZSI6IiIsInNzaF9wcml2YXRlIjoiIiwic3NoX3B1YmxpYyI6IiIsImF3c19zZWN1cml0eV9ncm91cCI6IiJ9fV19"],"cwd":"/","execPath":"/usr/bin/cml-internal","gid":0,"memoryUsage":{"arrayBuffers":15632910,"external":33348698,"heapTotal":106082304,"heapUsed":75520952,"rss":311275520},"pid":2440,"uid":0,"version":"v16.16.0"},"stack":"HttpError: Resource not accessible by integration\n at /snapshot/cml/node_modules/@octokit/request/dist-node/index.js:86:21\n at runMicrotasks (<anonymous>)\n at processTicksAndRejections (node:internal/process/task_queues:96:5)\n at async Job.doExecute (/snapshot/cml/node_modules/bottleneck/light.js:405:18)","trace":[{"column":21,"file":"/snapshot/cml/node_modules/@octokit/request/dist-node/index.js","function":null,"line":86,"method":null,"native":false},{"column":null,"file":null,"function":"runMicrotasks","line":null,"method":null,"native":false},{"column":5,"file":"node:internal/process/task_queues","function":"processTicksAndRejections","line":96,"method":null,"native":false},{"column":18,"file":"/snapshot/cml/node_modules/bottleneck/light.js","function":"async Job.doExecute","line":405,"method":"doExecute","native":false}]}
Jul 22 11:37:30 ip-172-31-32-70 cml.sh[2440]: {"level":"error","message":"HttpError: Resource not accessible by integration","stack":"Error: HttpError: Resource not accessible by integration\n at process.<anonymous> (/snapshot/cml/bin/cml/runner.js:333:32)\n at process.emit (node:events:539:35)\n at emit (node:internal/process/promises:140:20)\n at processPromiseRejections (node:internal/process/promises:274:27)\n at processTicksAndRejections (node:internal/process/task_queues:97:32)","status":"terminated"}
Jul 22 11:37:30 ip-172-31-32-70 cml.sh[2440]: {"level":"info","message":"Unregistering runner cml-4l6sv1qiu1..."}
Jul 22 11:37:30 ip-172-31-32-70 cml.sh[2440]: {"level":"error","message":"\tFailed: Bad request - Runner \"cml-4l6sv1qiu1\" is still running a job\""}
Jul 22 11:37:30 ip-172-31-32-70 cml.sh[2440]: {"level":"info","message":"Waiting 10 seconds to destroy"}
Jul 22 11:37:33 ip-172-31-32-70 systemd[1]: cml.service: Main process exited, code=exited, status=1/FAILURE
Jul 22 11:37:35 ip-172-31-32-70 systemd[1]: cml.service: Failed with result 'exit-code'.
:wave: @mikolajpabiszczak the reason is because the runner has been marked to do just one job with the parameter --single the option that you might be looking for is --reuse
@DavidGOrtega: I do know that. So let me emphasise this again:
-
the problem is not about
singlevs.reusable(I know and understand the difference between those). In both cases the workflow does not work (and it worked 3 months ago). I usedreusableonly to collect the logs provided and to see whether GitHub sees the runner (it does not: it marks it as offline). In fact all the workflows (using CML) that I tested do not work (but worked 3 months ago) -
Moreover, if I use the
reusablerunner, and I try to run the failed job again it does not pick up the already existing runner (bc. GitHub sees it as offline). -
In case I use
singlethe instance does not get cancelled after the failure, I have to terminate it manually.
(I added some clarifications in the opening message)
@mikolajpabiszczak You have in your logs
Jul 22 11:37:30 ip-172-31-32-70 cml.sh[2440]: {"level":"error","message":"HttpError: Resource not accessible by integration","stack":"Error: HttpError: Resource not accessible by integration\n at process.<anonymous> (/snapshot/cml/bin/cml/runner.js:333:32)\n at process.emit (node:events:539:35)\n at emit (node:internal/process/promises:140:20)\n at processPromiseRejections (node:internal/process/promises:274:27)\n at processTicksAndRejections (node:internal/process/task_queues:97:32)","status":"terminated"}
There must be something that you do not have permissions to do with your token? Then the unregistering can not happen yet because there is still a job in play
Just to be sure and move one step forward can you please your REPO_TOKEN? Does it have all all the permissions?
These were not changed since the working runs, but I checked it again. We are using a company application, so checking up wrt. this list
Repository level:
- [X] administration (read and write)
- [ ] checks (we are not using
cml send-github-check) - [X] pull requests (read and write)
Organisation level:
- [X] self-hosted runners (read and write)
Additionally, in the repository settings:
- all actions are allowed
- and workflows have read and write permissions
It looks like that app needs an additional scope it might not have? https://docs.github.com/en/rest/actions/workflow-runs#list-workflow-runs-for-a-repository
@mikolajpabiszczak to confirm is an issue with app generated token can you try and curl the endpoint with one of the generated tokens?
curl \
-H "Accept: application/vnd.github+json" \
-H "Authorization: token <TOKEN>" \
https://api.github.com/repos/OWNER/REPO/actions/runs
we might need to update our guide for using a github app?

- https://cml.dev/doc/self-hosted-runners#app
- https://github.com/settings/apps/new
- https://docs.github.com/en/rest/overview/permissions-required-for-github-apps#permission-on-actions
- https://docs.github.com/en/rest/actions/workflow-runs#list-workflow-runs-for-a-repository
Did some tests, indeed the culprit was the lack of sufficient permissions: after adding Read and write permissions for Actions the workflows work again.
Thx for your time and help! And yes, the guide needs an update in this case. ;D
@mikolajpabiszczak thanks for the report and help, we'll keep this open until we update the docs