SQLServer unexpectedly going healthy
Is there an existing issue for this?
- [X] I have searched the existing issues
Describe the bug
I have a situation where I've set up an indirect wait on loop between a sql server and it's database. There one chain of wait ons db.WaitFor(migrator.WaitFor(sql)) and whilst there's not a direct sql.WaitFor(db), I believe there is a form of wait going on behind the scenes (something to do with Connection String ready events).
Usually this means I end up in a permanent state of sql being Started but not healthy (and so the migrator waiting indefinitely)
But sometimes, I end up with the following where sql goes healthy, and so the migrator runs to completion (I've now seen this 3 times out of countless other times)
Expected Behavior
I'd expect the sql resource in the following example to either always go healthy, or never go healthy. Having it occasionally go healthy smells of there being an incorrect race condition somewhere.
Alternatively if my assumption about there being a circular dependency of waits is wrong, then there's some kind of bug as to why SQL isn't going healthy 100% of the time.
Steps To Reproduce
var sql = builder.AddSqlServer("sql");
var migrator = builder.AddExecutable("migrator", "pwsh", ".",
"-NoProfile",
"-NoLogo",
"-NonInteractive",
"-Command",
"Write-Host 'Starting Migration'; Start-Sleep -Seconds 10; Write-Host 'Done'")
.WaitFor(sql);
var db = sql.AddDatabase("whatever")
.WaitFor(migrator);
Exceptions (if any)
No response
.NET Version info
No response
Anything else?
<Sdk Name="Aspire.AppHost.Sdk" Version="9.0.0-preview.4.24477.2" />
Actually I think there may be something more broadly wrong with WaitFor on my machine
If I simply have an App host with
builder.AddSqlServer("sql")
.AddDatabase("tempdb"); // Using `tempdb` as that db is guaranteed to already exist
or even more simply
builder.AddSqlServer("sql");
Then the resources are refusing to go healthy
cc @mitchdenny
False alarm, this seems to be caused by #6002 .
That said, I didn't get any information out of aspire to help debug this issue - I'd have hoped to get something out of healthcheck output, but the closest I could get was a loop of the following, which didn't give any useful information.
trce: Aspire.Hosting.ApplicationModel.ResourceNotificationService[0]
Resource sql/sql-eekqfaye update published: ResourceType = Container, CreationTimeStamp = 2024-09-27T23:50:17, State = { Text = Running, Style = (null) }, HealthStatus = Unhealthy ExitCode = (null), EnvironmentVariables = { ACCEPT_EULA = Y, CONFIG_EDGE_BUILD = , MSSQL_PID = developer, MSSQL_RPC_PORT = 135, MSSQL_SA_PASSWORD = REDACTED, PATH = /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin }, Urls = { tcp = tcp://localhost:58066, tcp target port = tcp://127.0.0.1:32800 }, Properties = { container.image = mcr.microsoft.com/mssql/server:2022-latest, container.id = 1cd2cb64a0c81e1fceba14c2572baccc450e39f2777e4668e97ea42044c6d7ac, container.command = , container.args = System.Collections.Generic.List`1[System.String], container.ports = System.Collections.Immutable.ImmutableArray`1[System.Int32] }
dbug: Microsoft.Extensions.Diagnostics.HealthChecks.HealthCheckPublisherHostedService[103]
Health check 'Aspire.Hosting.Health.ResourceNotificationHealthCheckPublisher' completed after 0.0968ms
dbug: Microsoft.Extensions.Diagnostics.HealthChecks.HealthCheckPublisherHostedService[101]
Health check publisher processing completed after 0.817ms
dbug: Microsoft.Extensions.Diagnostics.HealthChecks.HealthCheckPublisherHostedService[100]
Running health check publishers
dbug: Microsoft.Extensions.Diagnostics.HealthChecks.HealthCheckPublisherHostedService[102]
Running health check publisher 'Aspire.Hosting.Health.ResourceNotificationHealthCheckPublisher'
trce: Aspire.Hosting.ApplicationModel.ResourceNotificationService[0]
Resource sql/sql update published: ResourceType = Container, CreationTimeStamp = (null), State = { Text = Hidden, Style = (null) }, HealthStatus = Unhealthy ExitCode = (null), EnvironmentVariables = { }, Urls = { }, Properties = { container.image = mcr.microsoft.com/mssql/server:2022-latest }
```
I'm going to close this issue #6002 handles the underlying network issue with rancher, whilst the recent additions of Health Checks to the resource UI give you a way to see why the health checks are failing.