aspire
aspire copied to clipboard
SQLServer `.WithDataVolume()` fails to start if you restart aspire too quickly
If you have a sql server configured WithDataVolume(), then if you try to restart aspire too quickly after closing the last session, then SQL server will fail to start up complaining that another instance is already running
2024-05-02T16:55:55.8514747 /opt/mssql/bin/sqlservr: Another instance of the application is already running.
This can be reproduced by restarting aspire whilst a session is already in progress. (e.g. by going to Debug > Start without Debugging in Visual Studio whilst the project is already running.)
The problem seems to be that the new instance of aspire will start the sql server container before the old one has terminated. If you run the following command, you'll see there's a period when there are two containers trying to access the same data volume
while($true) {
docker ps --filter Volume=MYAPP-SqlServer-data
Start-Sleep -Milliseconds 200
}
Example output:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a86f794e0650 REDACTED "/bin/bash /app/entr…" 1 second ago Up Less than a second 127.0.0.1:32796->1433/tcp SqlServer-b16k6ig
d8f04623bf55 REDACTED "/bin/bash /app/entr…" 26 seconds ago Up 25 seconds 127.0.0.1:32794->1433/tcp SqlServer-ae1fag8
When WithDataVolume() is used with sql server, aspire should and make sure the volume is no longer in use before starting the new instance of SQL Server.
A similar race condition exists if you expose containers on a fixed port number. Restarting aspire quickly can cause the second attempt to fail if the first one hasn't fully died yet.
container.WithHttpEndpoint(port: 5001, targetPort: 8080, env: "ASPNETCORE_HTTP_PORTS")
For now I have been able to work around these issues with the below lifecycle hook. This (ab)uses WithEnvironment in a similar manner to WaitForDependencies:
NOTE: See Improved work around at https://github.com/dotnet/aspire/issues/4066#issuecomment-2155877090
// Work around for https://github.com/dotnet/aspire/issues/4066
internal class DockerWaitWorkaroundLifecycleHook(DistributedApplicationExecutionContext executionContext,
ResourceLoggerService resourceLoggerService,
ResourceNotificationService resourceNotificationService) : IDistributedApplicationLifecycleHook
{
public Task BeforeStartAsync(DistributedApplicationModel appModel, CancellationToken cancellationToken = default)
{
if (executionContext.IsPublishMode)
{
return Task.CompletedTask;
}
foreach (var container in appModel.GetContainerResources()){
var mounts = container.Annotations
.OfType<ContainerMountAnnotation>()
.Where(v => !string.IsNullOrWhiteSpace(v.Source))
.ToList();
foreach (var mount in mounts)
{
container.Annotations.Add(new EnvironmentCallbackAnnotation(x => WaitForNoMatchesToFilter(container, $"Volume={mount.Source!}")));
}
var endpoints = container.Annotations
.OfType<EndpointAnnotation>()
.Where(e => e.TargetPort.HasValue)
.ToList();
foreach (var endpoint in endpoints)
{
container.Annotations.Add(new EnvironmentCallbackAnnotation(x => WaitForNoMatchesToFilter(container, $"Expose={endpoint.TargetPort!}")));
}
}
return Task.CompletedTask;
}
async Task WaitForNoMatchesToFilter(IResource resource, string filter)
{
resourceLoggerService.GetLogger(resource)?.LogInformation("Waiting for existing containers matching filter '{Filter}' to die", filter);
await resourceNotificationService.PublishUpdateAsync(resource, s => s with
{
State = new("Waiting", KnownResourceStateStyles.Info)
});
for (int i = 0; i < 30; i++)
{
var isInUse = await DoesContainerExistWithFilter(filter);
if (!isInUse)
{
return;
}
await Task.Delay(TimeSpan.FromSeconds(1));
}
throw new Exception($"Containers matching filter '{filter}' for '{resource.Name}' still exist after 30 seconds");
static async Task<bool> DoesContainerExistWithFilter(string filter)
{
var startInfo = new ProcessStartInfo
{
FileName = "docker",
ArgumentList =
{
"ps",
"--filter", filter,
"-q"
},
CreateNoWindow = true,
RedirectStandardOutput = true,
RedirectStandardError = true,
};
using var docker = Process.Start(startInfo);
if (docker == null)
{
// Not sure what happened - continue and let aspire do whatever it would have done w/out this hack.
return false;
}
await docker.WaitForExitAsync();
var error = await docker.StandardError.ReadToEndAsync();
var result = await docker.StandardOutput.ReadToEndAsync();
if (docker.ExitCode > 0)
{
// Not sure what happened - continue and let aspire do whatever it would have done w/out this hack.
return false;
}
return result.Length > 0;
}
}
}
public static class DockerWaitWorkaroundExtensions
{
public static IDistributedApplicationBuilder WithVolumeSharingWorkaround(this IDistributedApplicationBuilder builder)
{
builder.Services.TryAddLifecycleHook<DockerWaitWorkaroundLifecycleHook>();
return builder;
}
}
I have the same issue with using WithBindMount.
var sql = builder.AddSqlServer("sql", password: dbPassword, port: 63228)
.WithBindMount("./sqlserverconfig", "/usr/config")
.WithBindMount("../../../data/secrets","/var/opt/mssql/secrets")
.WithBindMount("../../../data/data","/var/opt/mssql/data")
.WithBindMount("../../../data/log","/var/opt/mssql/log")
.WithEntrypoint("/usr/config/entrypoint.sh")
.WithHealthCheck()
.AddDatabase("databasename");
@afscrome Your workaround works for me too. Thanks very much!
From the orchestrator perspective we did make a change (that will land in 8.1) that makes the shutdown of containers significantly faster. Previously it was essentially serialized, now it is done in parallel. That alone hopefully will make the occurrence of this problem rare.
Checking for published container port conflicts or volume conflicts is an interesting idea, but runs into some issues in general case. By default Aspire containers are proxied, so there will be no host port conflicts for containers, only proxy port conflicts. Those are controlled by DCP orchestrator directly, and are released early in the shutdown cycle, so they are unlikely to cause conflicts, even if the application is restarted quickly. As for the volumes, the fact that more than some container share a volume may be "by design"--they may be shared to facilitate data exchange. So It is not clear to me if there is a way to automatically determine whether volume sharing is a problem, or how to model it if the user were to tell us that it is, for their container.
All that said, I am going to keep this issue open because ultimately we want the launch of Aspire applications to be as smooth as possible. We are discussing two new features that may help here. One is "long running" containers. That is, containers, such as database containers, that are not started over and over again during each AppHost run, but instead kept running and reused. Another feature is more polished "eventing" mechanism for Aspire app model, that would allow the workaround above to be implemented in much cleaner way, and potentially integrated with applicable components such as SQL Server component.
@afscrome @gregpakes would love to learn your thoughts on the above.
@mitchdenny FYI
@karolz-ms my first thought is about host port conflicts. In my case I fix the host sql port. It makes accessing the database much easier through external tools. So I'm assuming host port conflicts are potentially an issue in this case?
Whilst I get that there isn't a general way to solve this right now, I also can't help but think that this behaviour currently means that SQL Server and WithDataVolume is a pit of failure right now, so even if not solved generally I think there should be something for at least SQL Server. Whilst I'm sure the reductions in DCP stop time should help here, I have also seen sql server take 10-20secs to stop. I suspect most of this time is SQL doing it's graceful shutdown as opposed to aspire being blocked on initiating the shut down.
I also agree that checking for port / volume conflicts isn't a general purpose idea - in my work around, they were very much "the thing I am able to check" rather than "the thing I want to check". The thing I really wanted to check was "Is a previous version of this resource still running" - this I do think could be solved in a general way. As this can apply to any kind of scenario where you want (or are forced) to only have a single instance.
I imagine you'll need solving this problem anyway with the "long running" container support so this same point could be used to either:
- (Current): Start a new instance regardless of whether the old instance is still running
- (Long running): Reuse / Attach to the existing instance
- Wait for the old instance to exit before starting a new instance
Below is an generalised version of the above work around. Rather than look at volumes and ports in use, you instead mark a container as "Exclusive". When Aspire starts up, it adds net.dot.aspire.project and net.dot.aspire.resource labels to all containers describing where they originated from. On next startup, I can then query for any containers with matching labels and use those to block until the previous container has exited.
var builder = DistributedApplication.CreateBuilder(args);
var db = builder.AddSqlServer("sql")
.AsExclusive()
.AddDatabase("db");
builder.Build().Run();
public static class ExclusiveExtensions {
public static IResourceBuilder<T> AsExclusive<T>(this IResourceBuilder<T> builder)
where T : ContainerResource
{
builder.ApplicationBuilder.Services.TryAddLifecycleHook<ExclusiveContainerLifecycleHook>();
return builder.WithAnnotation(new ExclusiveContainerAnnotation());
}
}
public class ExclusiveContainerAnnotation : IResourceAnnotation
{
}
internal class ExclusiveContainerLifecycleHook(
DistributedApplicationExecutionContext executionContext,
ResourceLoggerService resourceLoggerService,
ResourceNotificationService resourceNotificationService) : IDistributedApplicationLifecycleHook
{
private static readonly string ProjectLabel = $"net.dot.aspire.project={Assembly.GetEntryAssembly()?.GetName().Name}";
public Task BeforeStartAsync(DistributedApplicationModel appModel, CancellationToken cancellationToken = default)
{
if (executionContext.IsPublishMode)
{
return Task.CompletedTask;
}
foreach (var container in appModel.GetContainerResources())
{
var resourceLabel = $"net.dot.aspire.resource={container.Name}";
// Add origin labels to all containers
// Even if they're not used for exclusivity checks, they're useful for diagnostic purposes
container.Annotations.Add(new ContainerRunArgsCallbackAnnotation(x => {
x.Add($"--label");
x.Add(ProjectLabel);
x.Add($"--label");
x.Add(resourceLabel);
}));
if (executionContext.IsRunMode
&& container.Annotations.OfType<ExclusiveContainerAnnotation>().Any())
{
container.Annotations.Add(new EnvironmentCallbackAnnotation(x => WaitForPreviousContainerToExit(container, resourceLabel)));
}
}
return Task.CompletedTask;
}
private async Task WaitForPreviousContainerToExit(IResource resource, string resourceLabel)
{
resourceLoggerService.GetLogger(resource)?.LogInformation("Waiting for existing containers to exit");
await resourceNotificationService.PublishUpdateAsync(resource, s => s with
{
State = new("Waiting", KnownResourceStateStyles.Info)
});
for (int i = 0; i < 30; i++)
{
var isInUse = await DoesContainerExistWithLabel(resourceLabel);
if (!isInUse)
{
return;
}
await Task.Delay(TimeSpan.FromSeconds(1));
}
throw new Exception("Previous container still exist after 30 seconds");
static async Task<bool> DoesContainerExistWithLabel(string resourceLabel)
{
var startInfo = new ProcessStartInfo
{
FileName = "docker",
ArgumentList =
{
"ps",
"--filter", $"label={ProjectLabel}",
"--filter", $"label={resourceLabel}",
"-q"
},
CreateNoWindow = true,
RedirectStandardOutput = true,
RedirectStandardError = true,
};
using var docker = Process.Start(startInfo);
if (docker == null)
{
return false;
}
await docker.WaitForExitAsync();
var error = await docker.StandardError.ReadToEndAsync();
var result = await docker.StandardOutput.ReadToEndAsync();
if (docker.ExitCode > 0)
{
return false;
}
return result.Length > 0;
}
}
}
Another feature is more polished "eventing" mechanism for Aspire app model,
I'd love to learn more about this but if I start asking questions here, I'll derail this thread so I'll hold off from asking questions here.
@afscrome we are starting work on allowing persistent containers which will make this much less of an issue for folks.
I do like the idea of adding some extra metadata to the containers though.
Assuming it's related to https://github.com/dotnet/aspire/issues/3185
We made improvements in this area but now persistent containers is the way to go here