Issues
Issues copied to clipboard
Allow Health Checks to run in parallel with deployments and runbooks on a tentacle target.
The enhancement
The Need
Currently health checks on Tentacles Targets can not run at the same time as a deployment. This means the 1s idempotent readonly health check script can be blocked for hours resulting in the health check being blocked or with https://github.com/OctopusDeploy/Issues/issues/8118 the customer is given reports about the health check failing.
If health checks could run in parallel with deployments then:
- Health checks would not be blocked by deployments.
- Health checks would not block deployments.
Doing so will help out in cases when a deployment or health check script is hung indefinitely and so blocking the other indefinitely. For example this would prevent a hung health check from blocking a deployment.
This will also reduce the number of running tasks, and so reduce the number of slots taken up in a customers task cap.
Background
When a deployment script or health check script runs on a tentacle a RunningScript mutex is taken on the tentacle. The deployment typically takes a FullIsolation mutex while the health check takes NoIsolation mutex. The FullIsolation mutex can not be held while any NoIsolation mutex is held for the same name.
When a deployment is already running a script on a target and a health check is kicked off the following occurs.
- The health checks makes TCP communications to tentacle to instruct it to run the health check script with the
RunningScriptwith levelNoIsolation. - Since a deployment script is already running on the tenacle, Octopus Server will continue to make "GetStatus" TCP RCP calls to the tentacle until the deployment is complete.
- If the deployment script runs for a long time, this can result in tens of thousands of additional calls sent to the tentacle.
- Finally when the deployment script completes the health check script will start
The above assumes https://github.com/OctopusDeploy/Issues/issues/8118 is not applied.
It is not clear why health checks can not run in parallel with Deployments since health checks (if using the default script) do not modify the tentacle. It is not clear if sending potentially 10s of thousands of RPC calls to the tentacle was intentionally chosen over running the scripts in parallel.
This enhancement could be easily feature toggled at either a environment variable level or in the machine policy, which may make sense since customers can provide their own custom health check scripts to run.
Links
https://github.com/OctopusDeploy/Issues/issues/8581 https://github.com/OctopusDeploy/Issues/issues/8582 https://github.com/OctopusDeploy/Issues/issues/8118 https://octopusdeploy.slack.com/archives/CNHBHV2BX/p1676257279255989 [SC-68672]
It is not clear why health checks can not run in parallel with Deployments since health checks (if using the default script) do not modify the tentacle. It is not clear if sending potentially 10s of thousands of RPC calls to the tentacle was intentionally chosen over running the scripts in parallel.
This is an assumption we would need to validate. @droyad any insights?
I believe the historic default was that only one thing can run on a target at a time. Over time we've added things that do't need that.
It is not clear why health checks can not run in parallel
I think this mixes cause and effect. Deployments don't allow anything else to run at the same time, so health checks can't run in parallel (Not health checks can't run when deployments occur).
The current locking mechanism doesn't allow for discriminating based on task type. i.e it's not possible to say health checks can run but other deployments can't.