runner
runner copied to clipboard
System.NullReferenceException if no access to work directory
Describe the bug If by mistake in self-hosted runner the --work directory is not writable, the job fails but GitHub thinks the runner is still busy. The error is not easily visible. It'll take 15 minutes or more before it schedules next job. It won't cancel. This slows down the fix of the problem.
To Reproduce Steps to reproduce the behavior:
- Add a new runner in repository settings
- start the runner with --work parameter to read-only directory
- run a job on the runner
Expected behavior GitHub UI or at least the runner output should show the cause of the problem. Should schedule next workflow run since the workflow has failed.
Runner Version and Platform
2.299.1 (the one Add runner instructed to install yesterday, I first tried the 2.300.2 and had the same problem, but did not find the logs on it)
Linux
What's not working?
Error handling of both read-only directory and the NullReferenceException. The workflow run will not cancel.
Job Log Output
There is no output. The job is marked as failed, but the workflow remains active for 15 minutes or so.
Runner and Worker's Diagnostic Logs
Worker log
[2023-01-04 15:27:05Z INFO ExecutionContext] Initialize Env context
[2023-01-04 15:27:05Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin'
[2023-01-04 15:27:05Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2023-01-04 15:27:05Z INFO HostContext] Well known directory 'Diag': '/home/runner/_diag'
[2023-01-04 15:27:05Z INFO JobRunner] Starting the job execution context.
[2023-01-04 15:27:05Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin'
[2023-01-04 15:27:05Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2023-01-04 15:27:05Z INFO HostContext] Well known directory 'Work': '/var/lib/runner/work'
[2023-01-04 15:27:05Z INFO JobRunner] Validating directory permissions for: '/var/lib/runner/work'
[2023-01-04 15:27:05Z ERR JobRunner] System.UnauthorizedAccessException: Access to the path '/var/lib/runner/work' is denied.
---> System.IO.IOException: Permission denied
--- End of inner exception stack trace ---
at System.IO.FileSystem.CreateDirectory(String fullPath)
at System.IO.Directory.CreateDirectory(String path)
at GitHub.Runner.Worker.JobRunner.RunAsync(AgentJobRequestMessage message, CancellationToken jobRequestCancellationToken)
[2023-01-04 15:27:05Z ERR JobRunner] #####################################################
[2023-01-04 15:27:05Z ERR JobRunner] System.IO.IOException: Permission denied
[2023-01-04 15:27:05Z INFO JobRunner] Shutting down the job server queue.
[2023-01-04 15:27:05Z INFO JobServerQueue] Fire signal to shutdown all queues.
[2023-01-04 15:27:06Z INFO JobServer] Successfully started websocket client.
[2023-01-04 15:27:06Z INFO JobServerQueue] All queue process task stopped.
[2023-01-04 15:27:06Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'ca395085-040a-526b-2ce8-bdc85f692774'
, success rate: 1/1.
[2023-01-04 15:27:06Z INFO JobServerQueue] Web console line queue drained.
[2023-01-04 15:27:06Z INFO JobServerQueue] Uploading 1 files in one shot.
[2023-01-04 15:27:06Z INFO JobServerQueue] Try to upload 1 log files or attachments, success rate: 1/1.
[2023-01-04 15:27:06Z INFO JobServerQueue] File upload queue drained.
[2023-01-04 15:27:07Z INFO JobServerQueue] Job timeline record has been updated for the first time.
[2023-01-04 15:27:07Z INFO JobServerQueue] Timeline update queue drained.
[2023-01-04 15:27:07Z INFO JobServerQueue] Disposing job server ...
[2023-01-04 15:27:07Z INFO JobServerQueue] All queue process tasks have been stopped, and all queues are drained.
[2023-01-04 15:27:07Z INFO Worker] Job completed.
[2023-01-04 15:27:07Z ERR Worker] System.NullReferenceException: Object reference not set to an instance of an object.
at GitHub.Runner.Worker.JobRunner.CompleteJobAsync(IJobServer jobServer, IExecutionContext jobContext, AgentJobRequestMessage messa
ge, Nullable`1 taskResult)
at GitHub.Runner.Worker.JobRunner.RunAsync(AgentJobRequestMessage message, CancellationToken jobRequestCancellationToken)
at GitHub.Runner.Worker.JobRunner.RunAsync(AgentJobRequestMessage message, CancellationToken jobRequestCancellationToken)
at GitHub.Runner.Worker.Worker.RunAsync(String pipeIn, String pipeOut)
at GitHub.Runner.Worker.Program.MainAsync(IHostContext context, String[] args)
Runner log
[2023-01-04 15:27:01Z INFO JobNotification] Entering JobStarted Notification
[2023-01-04 15:27:01Z INFO JobNotification] Entering StartMonitor
[2023-01-04 15:27:07Z INFO ProcessInvokerWrapper] STDOUT/STDERR stream read finished.
[2023-01-04 15:27:07Z INFO ProcessInvokerWrapper] STDOUT/STDERR stream read finished.
[2023-01-04 15:27:07Z INFO ProcessInvokerWrapper] Finished process 170 with exit code 1, and elapsed time 00:00:06.3644220.
[2023-01-04 15:27:07Z INFO JobDispatcher] Worker finished for job ca395085-040a-526b-2ce8-bdc85f692774. Code: 1
[2023-01-04 15:27:07Z INFO JobDispatcher] Return code 1 indicate worker encounter an unhandled exception or app crash, attach worker stdout/stderr to JobRequest result.
[2023-01-04 15:27:07Z INFO GitHubActionsService] Starting operation Location.GetConnectionData
[2023-01-04 15:27:08Z INFO GitHubActionsService] Finished operation Location.GetConnectionData
[2023-01-04 15:27:09Z INFO JobDispatcher] finish job request for job ca395085-040a-526b-2ce8-bdc85f692774 with result: Failed
[2023-01-04 15:27:09Z INFO Terminal] WRITE LINE: 2023-01-04 15:27:09Z: Job build completed with result: Failed
[2023-01-04 15:27:09Z INFO JobDispatcher] Stop renew job request for job ca395085-040a-526b-2ce8-bdc85f692774.
[2023-01-04 15:27:09Z INFO JobDispatcher] job renew has been cancelled, stop renew job request 38.
[2023-01-04 15:27:09Z ERR JobDispatcher] Unhandled exception happened in worker:
[2023-01-04 15:27:09Z ERR JobDispatcher] System.NullReferenceException: Object reference not set to an instance of an object.
[2023-01-04 15:27:09Z INFO JobNotification] Entering JobCompleted Notification
[2023-01-04 15:27:09Z INFO JobNotification] Entering EndMonitor
I am experiencing the same issue using x64 runner on Windows 11, removing the read-only attribute or running it as Administrator does not help. I am using 2.300.2 version.
UPDATE: I was able to work around the issue by installing an older actions-runner (2.998.x) and causing its auto-update. Currently, everything works as expected
This issue is stale because it has been open 365 days with no activity. Remove stale label or comment or this will be closed in 15 days.
I tried this with actions-runner-linux-x64-2.311.0.tar.gz and it seems to be fixed in the meantime.
I find no longer the System.NullReferenceException and GitHub UI shows the failure and exception information with the path that it could not write into.