Issues
Issues copied to clipboard
Health Check Scripts can sometimes become hung indefinitely on a tentacle, blocking deployments from running.
Severity
No response
Version
Seen on 2023.2.3264
Latest Version
None
What happened?
Very occasionally health check powershell scripts became hung and do not progress, preventing deployment to that target. Until the health check to be manually cancelled.
Reproduction
Not possible
Error and Stacktrace
07:19:03 Verbose | Performing health check on machine
07:19:03 Verbose | Acquiring isolation mutex RunningScript with NoIsolation in ServerTasks-81966
07:19:03 Verbose | Executable directory is C:\Windows\system32\WindowsPowershell\v1.0
07:19:03 Verbose | Executable name or full path: C:\Windows\system32\WindowsPowershell\v1.0\PowerShell.exe
07:19:03 Verbose | No user context provided. Running as current user.
07:19:03 Verbose | Starting C:\Windows\system32\WindowsPowershell\v1.0\PowerShell.exe in working directory 'D:\Octopus\Work\123' using 'redacted' encoding running as 'redacted' with the same environment variables as the launching process
07:19:07 Verbose | Process C:\Windows\system32\WindowsPowershell\v1.0\PowerShell.exe in D:\Octopus\Work\123 exited with code 0
07:19:07 Verbose | Exit code: 0
07:19:07 Verbose | Acquiring isolation mutex RunningScript with NoIsolation in ServerTasks-81966
07:19:07 Verbose | Executable directory is C:\Windows\system32\WindowsPowershell\v1.0
07:19:07 Verbose | Executable name or full path: C:\Windows\system32\WindowsPowershell\v1.0\PowerShell.exe
07:19:07 Verbose | No user context provided. Running as current user.
07:19:07 Verbose | Starting C:\Windows\system32\WindowsPowershell\v1.0\PowerShell.exe in working directory 'D:\Octopus\Work\124' using 'redacted' encoding running as 'redacted' with the same environment variables as the launching process
|
No output was seen for 2 days, when the health check was cancelled.
More Information
The issue here is that:
- The hung health check blocks deployment to the target.
- The default health check script typically takes 1s or less to run, however it is allowed to run for days.
Also worth noting:
- The default health check is idempotent making it a good candidate for re-trying the script when hung.
See also
- SC-68672
- https://github.com/OctopusDeploy/Issues/issues/8581
- https://github.com/OctopusDeploy/Issues/issues/8118 Note that the fix in this ticket has some drawbacks:
- Customers get alerts about "failed health checks" when really the target was healthy and we were able to communicate with it.
- The error only occurs very occasionally so retrying is probably a more accurate representation of the state of the tentacle.
- The fact that it cancels the health check means that we don't find out if the tentacle is in-fact in a state in which it can not run scripts and so is unhealty but left as healthy.
Workaround
Manually cancel the health check.