Issues icon indicating copy to clipboard operation
Issues copied to clipboard

Health Checks fail and mark the target as unhealthy, when a health check runs at the same time as executing a a long running script step in a deployment/runbook.

Open IsaacCalligeros95 opened this issue 1 year ago • 0 comments

Severity

Low, Workarounds available

Version

2023.1 -> *

Caused by the fix to https://github.com/OctopusDeploy/Issues/issues/8118

Latest Version

I could reproduce the problem in the latest build

What happened?

While a long-running script step is executing health check tasks will start executing but are blocked waiting on the script step by the ScriptIsolationMutex. With the addition of health check timeouts this now results in failed health checks. As well as the machine being marked as unhealthy.

Reproduction

  • Configure a 'Run a Script Step' to run on a deployment target, this script will need to run for > 1 minute.
  • Trigger a deployment
  • Trigger a health check
  • Health check fails with a task cancelled exception

Error and Stacktrace

Waiting for the script in task ServerTasks-10661 to finish as that script requires that no other Octopus scripts are executing on this target simultaneously. 

The task was cancelled

More Information

This issue depends on a combination of long-running script steps running in parallel with health checks. As this scenario is uncommon (health checks are triggered once daily) this issue is being treated as a low priority.

Workaround

Listening/Polling/SSH tentacles:

The timeout used for health checks can be increased by increasing the Machine policy Connect timeout setting image

Polling tentacles:

Additionally polling tentacles also include the Polling Response Timeout aka PollingRequestMaximumMessageProcessingTimeout but that has since been feature toggled off in Polling Response Timeout causes Tentacle to time out, when it doesn't need to. Eventually the setting will be removed, reducing the overall health check timeout. This has been captured in issue Health check timeouts can not be increased by the PollingRequestMaximumMessageProcessingTimeout configured on the tentacle.

IsaacCalligeros95 avatar Jan 16 '24 04:01 IsaacCalligeros95