aspire icon indicating copy to clipboard operation
aspire copied to clipboard

SQLServer unexpectedly going healthy

Open afscrome opened this issue 1 year ago • 3 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Describe the bug

I have a situation where I've set up an indirect wait on loop between a sql server and it's database. There one chain of wait ons db.WaitFor(migrator.WaitFor(sql)) and whilst there's not a direct sql.WaitFor(db), I believe there is a form of wait going on behind the scenes (something to do with Connection String ready events).

Usually this means I end up in a permanent state of sql being Started but not healthy (and so the migrator waiting indefinitely) image

But sometimes, I end up with the following where sql goes healthy, and so the migrator runs to completion (I've now seen this 3 times out of countless other times) image

Expected Behavior

I'd expect the sql resource in the following example to either always go healthy, or never go healthy. Having it occasionally go healthy smells of there being an incorrect race condition somewhere.

Alternatively if my assumption about there being a circular dependency of waits is wrong, then there's some kind of bug as to why SQL isn't going healthy 100% of the time.

Steps To Reproduce

var sql = builder.AddSqlServer("sql");
var migrator = builder.AddExecutable("migrator", "pwsh", ".",
    "-NoProfile",
    "-NoLogo",
    "-NonInteractive",
    "-Command",
    "Write-Host 'Starting Migration'; Start-Sleep -Seconds 10; Write-Host 'Done'")
    .WaitFor(sql);

var db = sql.AddDatabase("whatever")
    .WaitFor(migrator);

Exceptions (if any)

No response

.NET Version info

No response

Anything else?

 <Sdk Name="Aspire.AppHost.Sdk" Version="9.0.0-preview.4.24477.2" />

afscrome avatar Sep 27 '24 21:09 afscrome

Actually I think there may be something more broadly wrong with WaitFor on my machine

If I simply have an App host with

    builder.AddSqlServer("sql")
        .AddDatabase("tempdb"); // Using `tempdb` as that db is guaranteed to already exist

or even more simply

    builder.AddSqlServer("sql");

Then the resources are refusing to go healthy image

afscrome avatar Sep 27 '24 23:09 afscrome

cc @mitchdenny

radical avatar Sep 27 '24 23:09 radical

False alarm, this seems to be caused by #6002 .

That said, I didn't get any information out of aspire to help debug this issue - I'd have hoped to get something out of healthcheck output, but the closest I could get was a loop of the following, which didn't give any useful information.

trce: Aspire.Hosting.ApplicationModel.ResourceNotificationService[0]
      Resource sql/sql-eekqfaye update published: ResourceType = Container, CreationTimeStamp = 2024-09-27T23:50:17, State = { Text = Running, Style = (null) }, HealthStatus = Unhealthy ExitCode = (null), EnvironmentVariables = { ACCEPT_EULA = Y, CONFIG_EDGE_BUILD = , MSSQL_PID = developer, MSSQL_RPC_PORT = 135, MSSQL_SA_PASSWORD = REDACTED, PATH = /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin }, Urls = { tcp = tcp://localhost:58066, tcp target port = tcp://127.0.0.1:32800 }, Properties = { container.image = mcr.microsoft.com/mssql/server:2022-latest, container.id = 1cd2cb64a0c81e1fceba14c2572baccc450e39f2777e4668e97ea42044c6d7ac, container.command = , container.args = System.Collections.Generic.List`1[System.String], container.ports = System.Collections.Immutable.ImmutableArray`1[System.Int32] }
dbug: Microsoft.Extensions.Diagnostics.HealthChecks.HealthCheckPublisherHostedService[103]
      Health check 'Aspire.Hosting.Health.ResourceNotificationHealthCheckPublisher' completed after 0.0968ms
dbug: Microsoft.Extensions.Diagnostics.HealthChecks.HealthCheckPublisherHostedService[101]
      Health check publisher processing completed after 0.817ms
dbug: Microsoft.Extensions.Diagnostics.HealthChecks.HealthCheckPublisherHostedService[100]
      Running health check publishers
dbug: Microsoft.Extensions.Diagnostics.HealthChecks.HealthCheckPublisherHostedService[102]
      Running health check publisher 'Aspire.Hosting.Health.ResourceNotificationHealthCheckPublisher'
trce: Aspire.Hosting.ApplicationModel.ResourceNotificationService[0]
      Resource sql/sql update published: ResourceType = Container, CreationTimeStamp = (null), State = { Text = Hidden, Style = (null) }, HealthStatus = Unhealthy ExitCode = (null), EnvironmentVariables = {  }, Urls = {  }, Properties = { container.image = mcr.microsoft.com/mssql/server:2022-latest }
      ```

afscrome avatar Sep 28 '24 00:09 afscrome

I'm going to close this issue #6002 handles the underlying network issue with rancher, whilst the recent additions of Health Checks to the resource UI give you a way to see why the health checks are failing.

image

afscrome avatar Oct 06 '24 18:10 afscrome