aspire icon indicating copy to clipboard operation
aspire copied to clipboard

SQLServer `.WithDataVolume()` fails to start if you restart aspire too quickly

Open afscrome opened this issue 1 year ago • 1 comments

If you have a sql server configured WithDataVolume(), then if you try to restart aspire too quickly after closing the last session, then SQL server will fail to start up complaining that another instance is already running

2024-05-02T16:55:55.8514747 /opt/mssql/bin/sqlservr: Another instance of the application is already running.

This can be reproduced by restarting aspire whilst a session is already in progress. (e.g. by going to Debug > Start without Debugging in Visual Studio whilst the project is already running.)

The problem seems to be that the new instance of aspire will start the sql server container before the old one has terminated. If you run the following command, you'll see there's a period when there are two containers trying to access the same data volume

while($true) {
  docker ps --filter Volume=MYAPP-SqlServer-data
  Start-Sleep -Milliseconds 200
}

Example output:

CONTAINER ID   IMAGE      COMMAND                  CREATED          STATUS                  PORTS                       NAMES
a86f794e0650   REDACTED   "/bin/bash /app/entr…"   1 second ago     Up Less than a second   127.0.0.1:32796->1433/tcp   SqlServer-b16k6ig
d8f04623bf55   REDACTED   "/bin/bash /app/entr…"   26 seconds ago   Up 25 seconds           127.0.0.1:32794->1433/tcp   SqlServer-ae1fag8

When WithDataVolume() is used with sql server, aspire should and make sure the volume is no longer in use before starting the new instance of SQL Server.

afscrome avatar May 02 '24 16:05 afscrome

A similar race condition exists if you expose containers on a fixed port number. Restarting aspire quickly can cause the second attempt to fail if the first one hasn't fully died yet.

container.WithHttpEndpoint(port: 5001, targetPort: 8080, env: "ASPNETCORE_HTTP_PORTS")

For now I have been able to work around these issues with the below lifecycle hook. This (ab)uses WithEnvironment in a similar manner to WaitForDependencies:

NOTE: See Improved work around at https://github.com/dotnet/aspire/issues/4066#issuecomment-2155877090

// Work around for https://github.com/dotnet/aspire/issues/4066
internal class DockerWaitWorkaroundLifecycleHook(DistributedApplicationExecutionContext executionContext,
    ResourceLoggerService resourceLoggerService,
    ResourceNotificationService resourceNotificationService) : IDistributedApplicationLifecycleHook
{
    public Task BeforeStartAsync(DistributedApplicationModel appModel, CancellationToken cancellationToken = default)
    {
        if (executionContext.IsPublishMode)
        {
            return Task.CompletedTask;
        }

        foreach (var container in appModel.GetContainerResources()){
            
            var mounts = container.Annotations
                .OfType<ContainerMountAnnotation>()
                .Where(v => !string.IsNullOrWhiteSpace(v.Source))
                .ToList();
            foreach (var mount in mounts)
            {
                container.Annotations.Add(new EnvironmentCallbackAnnotation(x => WaitForNoMatchesToFilter(container, $"Volume={mount.Source!}")));
            }

            var endpoints = container.Annotations
                .OfType<EndpointAnnotation>()
                .Where(e => e.TargetPort.HasValue)
                .ToList();
            foreach (var endpoint in endpoints)
            {
                container.Annotations.Add(new EnvironmentCallbackAnnotation(x => WaitForNoMatchesToFilter(container, $"Expose={endpoint.TargetPort!}")));
            }
        }

        return Task.CompletedTask;
    }

    async Task WaitForNoMatchesToFilter(IResource resource, string filter)
    {
        resourceLoggerService.GetLogger(resource)?.LogInformation("Waiting for existing containers matching filter '{Filter}' to die", filter);
        await resourceNotificationService.PublishUpdateAsync(resource, s => s with
        {
            State = new("Waiting", KnownResourceStateStyles.Info)
        });

        for (int i = 0; i < 30; i++)
        {
            var isInUse = await DoesContainerExistWithFilter(filter);
            if (!isInUse)
            {
                return;
            }
            await Task.Delay(TimeSpan.FromSeconds(1));
        }

        throw new Exception($"Containers matching filter '{filter}' for '{resource.Name}' still exist after 30 seconds");

        static async Task<bool> DoesContainerExistWithFilter(string filter)
        {
            var startInfo = new ProcessStartInfo
            {
                FileName = "docker",
                ArgumentList =
                    {
                        "ps",
                        "--filter", filter,
                        "-q"
                    },
                CreateNoWindow = true,
                RedirectStandardOutput = true,
                RedirectStandardError = true,
            };

            using var docker = Process.Start(startInfo);
            if (docker == null)
            {
                // Not sure what happened - continue and let aspire do whatever it would have done w/out this hack.
                return false;
            }
            await docker.WaitForExitAsync();

            var error = await docker.StandardError.ReadToEndAsync();
            var result = await docker.StandardOutput.ReadToEndAsync();
            if (docker.ExitCode > 0)
            {
                // Not sure what happened - continue and let aspire do whatever it would have done w/out this hack.
                return false;
            }

            return result.Length > 0;
        }
    }
}

public static class DockerWaitWorkaroundExtensions
{
    public static IDistributedApplicationBuilder WithVolumeSharingWorkaround(this IDistributedApplicationBuilder builder)
    {
        builder.Services.TryAddLifecycleHook<DockerWaitWorkaroundLifecycleHook>();
        return builder;
    }
}

afscrome avatar May 24 '24 15:05 afscrome

I have the same issue with using WithBindMount.

var sql = builder.AddSqlServer("sql", password: dbPassword, port: 63228)
    .WithBindMount("./sqlserverconfig", "/usr/config")
    .WithBindMount("../../../data/secrets","/var/opt/mssql/secrets")
    .WithBindMount("../../../data/data","/var/opt/mssql/data")
    .WithBindMount("../../../data/log","/var/opt/mssql/log")
    .WithEntrypoint("/usr/config/entrypoint.sh")
    .WithHealthCheck()
    .AddDatabase("databasename");

gregpakes avatar Jun 06 '24 07:06 gregpakes

@afscrome Your workaround works for me too. Thanks very much!

gregpakes avatar Jun 06 '24 08:06 gregpakes

From the orchestrator perspective we did make a change (that will land in 8.1) that makes the shutdown of containers significantly faster. Previously it was essentially serialized, now it is done in parallel. That alone hopefully will make the occurrence of this problem rare.

Checking for published container port conflicts or volume conflicts is an interesting idea, but runs into some issues in general case. By default Aspire containers are proxied, so there will be no host port conflicts for containers, only proxy port conflicts. Those are controlled by DCP orchestrator directly, and are released early in the shutdown cycle, so they are unlikely to cause conflicts, even if the application is restarted quickly. As for the volumes, the fact that more than some container share a volume may be "by design"--they may be shared to facilitate data exchange. So It is not clear to me if there is a way to automatically determine whether volume sharing is a problem, or how to model it if the user were to tell us that it is, for their container.

All that said, I am going to keep this issue open because ultimately we want the launch of Aspire applications to be as smooth as possible. We are discussing two new features that may help here. One is "long running" containers. That is, containers, such as database containers, that are not started over and over again during each AppHost run, but instead kept running and reused. Another feature is more polished "eventing" mechanism for Aspire app model, that would allow the workaround above to be implemented in much cleaner way, and potentially integrated with applicable components such as SQL Server component.

@afscrome @gregpakes would love to learn your thoughts on the above.

@mitchdenny FYI

karolz-ms avatar Jun 06 '24 21:06 karolz-ms

@karolz-ms my first thought is about host port conflicts. In my case I fix the host sql port. It makes accessing the database much easier through external tools. So I'm assuming host port conflicts are potentially an issue in this case?

gregpakes avatar Jun 07 '24 08:06 gregpakes

Whilst I get that there isn't a general way to solve this right now, I also can't help but think that this behaviour currently means that SQL Server and WithDataVolume is a pit of failure right now, so even if not solved generally I think there should be something for at least SQL Server. Whilst I'm sure the reductions in DCP stop time should help here, I have also seen sql server take 10-20secs to stop. I suspect most of this time is SQL doing it's graceful shutdown as opposed to aspire being blocked on initiating the shut down.

I also agree that checking for port / volume conflicts isn't a general purpose idea - in my work around, they were very much "the thing I am able to check" rather than "the thing I want to check". The thing I really wanted to check was "Is a previous version of this resource still running" - this I do think could be solved in a general way. As this can apply to any kind of scenario where you want (or are forced) to only have a single instance.

I imagine you'll need solving this problem anyway with the "long running" container support so this same point could be used to either:

  1. (Current): Start a new instance regardless of whether the old instance is still running
  2. (Long running): Reuse / Attach to the existing instance
  3. Wait for the old instance to exit before starting a new instance

Below is an generalised version of the above work around. Rather than look at volumes and ports in use, you instead mark a container as "Exclusive". When Aspire starts up, it adds net.dot.aspire.project and net.dot.aspire.resource labels to all containers describing where they originated from. On next startup, I can then query for any containers with matching labels and use those to block until the previous container has exited.

var builder = DistributedApplication.CreateBuilder(args);

var db = builder.AddSqlServer("sql")
    .AsExclusive()
    .AddDatabase("db");

builder.Build().Run();


public static class ExclusiveExtensions {

    public static IResourceBuilder<T> AsExclusive<T>(this IResourceBuilder<T> builder)
        where T : ContainerResource
    {
        builder.ApplicationBuilder.Services.TryAddLifecycleHook<ExclusiveContainerLifecycleHook>();
        return builder.WithAnnotation(new ExclusiveContainerAnnotation());
    }
}

public class ExclusiveContainerAnnotation : IResourceAnnotation
{
}


internal class ExclusiveContainerLifecycleHook(
    DistributedApplicationExecutionContext executionContext,
    ResourceLoggerService resourceLoggerService,
    ResourceNotificationService resourceNotificationService) : IDistributedApplicationLifecycleHook
{
    private static readonly string ProjectLabel = $"net.dot.aspire.project={Assembly.GetEntryAssembly()?.GetName().Name}";

    public Task BeforeStartAsync(DistributedApplicationModel appModel, CancellationToken cancellationToken = default)
    {
        if (executionContext.IsPublishMode)
        {
            return Task.CompletedTask;
        }

        foreach (var container in appModel.GetContainerResources())
        {
            var resourceLabel = $"net.dot.aspire.resource={container.Name}";
            // Add origin labels to all containers
            // Even if they're not used for exclusivity checks, they're useful for diagnostic purposes
            container.Annotations.Add(new ContainerRunArgsCallbackAnnotation(x => {
                x.Add($"--label");
                x.Add(ProjectLabel);
                x.Add($"--label");
                x.Add(resourceLabel);
            }));

            if (executionContext.IsRunMode
                && container.Annotations.OfType<ExclusiveContainerAnnotation>().Any())
            {
                container.Annotations.Add(new EnvironmentCallbackAnnotation(x => WaitForPreviousContainerToExit(container, resourceLabel)));
            }

        }

        return Task.CompletedTask;
    }


    private async Task WaitForPreviousContainerToExit(IResource resource, string resourceLabel)
    {
        resourceLoggerService.GetLogger(resource)?.LogInformation("Waiting for existing containers to exit");
        await resourceNotificationService.PublishUpdateAsync(resource, s => s with
        {
            State = new("Waiting", KnownResourceStateStyles.Info)
        });

        for (int i = 0; i < 30; i++)
        {
            var isInUse = await DoesContainerExistWithLabel(resourceLabel);
            if (!isInUse)
            {
                return;
            }
            await Task.Delay(TimeSpan.FromSeconds(1));
        }

        throw new Exception("Previous container still exist after 30 seconds");

        static async Task<bool> DoesContainerExistWithLabel(string resourceLabel)
        {
            var startInfo = new ProcessStartInfo
            {
                FileName = "docker",
                ArgumentList =
                    {
                        "ps",
                        "--filter", $"label={ProjectLabel}",
                        "--filter", $"label={resourceLabel}",
                        "-q"
                    },
                CreateNoWindow = true,
                RedirectStandardOutput = true,
                RedirectStandardError = true,
            };

            using var docker = Process.Start(startInfo);
            if (docker == null)
            {
                return false;
            }
            await docker.WaitForExitAsync();

            var error = await docker.StandardError.ReadToEndAsync();
            var result = await docker.StandardOutput.ReadToEndAsync();
            if (docker.ExitCode > 0)
            {
                return false;
            }

            return result.Length > 0;
        }
    }
}

Another feature is more polished "eventing" mechanism for Aspire app model,

I'd love to learn more about this but if I start asking questions here, I'll derail this thread so I'll hold off from asking questions here.

afscrome avatar Jun 08 '24 08:06 afscrome

@afscrome we are starting work on allowing persistent containers which will make this much less of an issue for folks.

I do like the idea of adding some extra metadata to the containers though.

mitchdenny avatar Jun 10 '24 01:06 mitchdenny

Assuming it's related to https://github.com/dotnet/aspire/issues/3185

sebastienros avatar Jul 09 '24 08:07 sebastienros

We made improvements in this area but now persistent containers is the way to go here

dbreshears avatar Feb 14 '25 20:02 dbreshears