Issues
Issues copied to clipboard
Health Checks fail and mark the target as unhealthy, when a health check runs at the same time as executing a a long running script step in a deployment/runbook.
Severity
Low, Workarounds available
Version
2023.1 -> *
Caused by the fix to https://github.com/OctopusDeploy/Issues/issues/8118
Latest Version
I could reproduce the problem in the latest build
What happened?
While a long-running script step is executing health check tasks will start executing but are blocked waiting on the script step by the ScriptIsolationMutex. With the addition of health check timeouts this now results in failed health checks. As well as the machine being marked as unhealthy.
Reproduction
- Configure a 'Run a Script Step' to run on a deployment target, this script will need to run for > 1 minute.
- Trigger a deployment
- Trigger a health check
- Health check fails with a task cancelled exception
Error and Stacktrace
Waiting for the script in task ServerTasks-10661 to finish as that script requires that no other Octopus scripts are executing on this target simultaneously.
The task was cancelled
More Information
This issue depends on a combination of long-running script steps running in parallel with health checks. As this scenario is uncommon (health checks are triggered once daily) this issue is being treated as a low priority.
Workaround
Listening/Polling/SSH tentacles:
The timeout used for health checks can be increased by increasing the Machine policy Connect timeout setting
Polling tentacles:
Additionally polling tentacles also include the Polling Response Timeout
aka PollingRequestMaximumMessageProcessingTimeout
but that has since been feature toggled off in Polling Response Timeout causes Tentacle to time out, when it doesn't need to. Eventually the setting will be removed, reducing the overall health check timeout. This has been captured in issue Health check timeouts can not be increased by the PollingRequestMaximumMessageProcessingTimeout configured on the tentacle.