cdk-github-runners
cdk-github-runners copied to clipboard
windows ec2 doesn`t reach the stop state in ec2 userdata script
Hey @kichik,
I had one special use case which I can replicate. While the job has been successfully completed in github, the ec2 instance and the step function job execution are still running.
runner.log
Current runner version: '2.316.1'
2024-05-15 09:41:16Z: Listening for Jobs
2024-05-15 09:41:19Z: Running job: test_config
2024-05-15 10:05:49Z: Job test_config completed with result: Canceled
./run.cmd : An error occurred: Access denied. System:ServiceIdentity;DDDDDDDD-DDDD-DDDD-DDDD-DDDDDDDDDDDD needs View
permissions to perform the action.
At C:\Windows\system32\config\systemprofile\AppData\Local\Temp\EC2Launch988827203\UserScript.ps1:48 char:3
+ ./run.cmd 2>&1 | Out-File -Encoding ASCII -Append /actions/runner.l ...
+ ~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (An error occurr...orm the action.:String) [], RemoteException
+ FullyQualifiedErrorId : NativeCommandError
"Runner listener exit with retryable error, re-launch runner in 5 seconds."
"Restarting runner..."
1 file(s) copied.
? Connected to GitHub
Failed to create a session. The runner registration has been deleted from the server, please re-configure. Runner
registrations are automatically deleted for runners that have not connected to the service recently.
"Runner listener exit with terminated error, stop the service, no retry needed."
"Exiting runner..."
What`s the problem:
The machine is still running and we waste money until we recognize it. (yes additional alerting in this case would make sense too but I haven`t yet in place.)
Proposal:
It would be great to have a try catch block around the action statement in powershell https://github.com/CloudSnorkel/cdk-github-runners/blob/f08da20f3fe70ae8fc86f85db304b15e191601f3/src/providers/ec2.ts#L165
to ensure the machine get`s terminated https://github.com/CloudSnorkel/cdk-github-runners/blob/f08da20f3fe70ae8fc86f85db304b15e191601f3/src/providers/ec2.ts#L172