Connection Test Health Checks are blocked by deployments on a Tentacle, which is surprising for a "connection test".
Severity
No response
Version
Since time immemorial
Latest Version
I could reproduce the problem in the latest build
What happened?
When configuring a health check to run a Only perform connection test (useful for raw scripting), the health check is still blocked by running deployments on the Tentacle.
The connection tests is blocked waiting to get a NoIsolation level RunningScript mutex on the tentacle. The script that is run on tentacle is a no-op exit 0;. It doesn't make sense that a no-op script should be taking the same mutex that a deployment uses.
Reproduction
Start a long running single step deployment against a tentacle and then kick off a connection test health check. The Only perform connection test will wait for the deployment to complete. Despite the fact that tentacle has actually been successfully communicated with.
Error and Stacktrace
No response
More Information
The issue here is:
- Health checks run longer than they should in some cases resulting in day long health checks.
- The health check takes up a "task slot", preventing actual deployments from running.
- Customers get confused since they see a health check running for a long time and assume that is the cause of their deployment not running, when in reality it is that an existing deployment is running preventing the health check from running and also preventing the next deployment to that target from running.
- With the fixes from https://github.com/OctopusDeploy/Issues/issues/8118, Customers get alerts about "failed health checks" when really the target was healthy and we were able to communicate with it.
This is similar to Connectivity checks running indefinitely blocking execution on a deployment target but differs in that this issue is about Connectivity check having surprising behaviour.
SC-68672
Workaround
No response