aspire
aspire copied to clipboard
SQL Timeouts with proxied connections
I've been facing a number of sparodic SQL timeouts - performance generally seems fine, but a number of timeouts show up in the logs. On a hunch, I tried disabling proxying of the sql endpoint and since then I haven't seen a timeout again.
//Try to mitigate timeouts
var endpoint= sql.Resource.Annotations.OfType<EndpointAnnotation>()
.Single(x => x.Name == sql.Resource.PrimaryEndpoint.EndpointName);
endpoint.IsProxied = false;
Any thoughts on what could be causing the timeouts, or how to trace down the issue.
We are currently using System.SqlClient
. We do have a branch in flight that's upgrading to Microsoft.SqlClient
, but tha'ts not quite ready to test with yet. We're seeing this in a couple of apps - both of which are windows exes talking to SQL running in a container.
I haven't caught any of these timeouts myself - I find them mainly in the logs from background tasks - I wonder if there is some kind of connection idle / time out that's not being fully handled.
Exceptions are all of the following form:
System.Data.SqlClient.SqlException (0x80131904): Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
---> System.ComponentModel.Win32Exception (258): The wait operation timed out.
at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
at System.Data.SqlClient.SqlDataReader.TryConsumeMetaData()
at System.Data.SqlClient.SqlDataReader.get_MetaData()
at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)
at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async, Int32 timeout, Task& task, Boolean asyncWrite, SqlDataReader ds)
at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior)
at System.Data.SqlClient.SqlCommand.ExecuteReader()
at MYAPP.FOO.BAR ...
cc @karolz-ms @danegsta
Upgraded to Microsoft.Data.SqlClient
and the timeouts still persist
After adding ;Pooling=false
to the connection string, and I haven't seen timeouts for 2 hours now, so it does seem like this is something to do with connection reuse - perhaps connection idle time differing between the app, sql server and the proxy.
Got a repro.
- Spin up an Aspire app with sql server in aspire
var builder = DistributedApplication.CreateBuilder(args);
var password = builder.AddParameter("SqlPassword", true);
builder.AddSqlServer("SqlServer", password: password, port: 14431);
await builder.Build().RunAsync();
- Spin up a console app, replacing the password on line 1 with your password:
var password = "";
var cs = $"Server=127.0.0.1,14431;User ID=sa;Password={password};TrustServerCertificate=true";
var interval = TimeSpan.FromMilliseconds(500);
var concurrency = 10;
var cts = new CancellationTokenSource();
var startTime = DateTime.Now;
try
{
await Task.WhenAll(Enumerable.Range(0, concurrency).Select(_ => DoWork()));
}
finally
{
Console.WriteLine($"Start: {startTime}");
Console.WriteLine($"End: {DateTime.UtcNow}");
}
async Task DoWork()
{
await Task.Yield();
while (true)
{
using (var connection = new SqlConnection(cs))
{
await connection.OpenAsync();
var cmd = new SqlCommand("select 1", connection);
try
{
await cmd.ExecuteNonQueryAsync();
Console.WriteLine($"{connection.ClientConnectionId} @ {DateTime.Now}");
}
catch (Exception ex)
{
cts.Cancel();
Console.Beep();
Console.Beep();
Console.Beep();
throw new Exception($"CONNECTION FAILED: {connection.ClientConnectionId} @ {DateTime.Now}", ex);
}
}
await Task.Delay(interval + TimeSpan.FromMilliseconds(Random.Shared.Next(-200,200)));
if (cts.IsCancellationRequested)
{
return;
}
}
}
I've run this app 5 times now and reproduced 5 times, taking anywhere from 1 minute to 5 minutes before failing. Also reproed on a second non work machine to try and rule out any funny VPN configurations - on my personal machine I reproed in roughly 5 mins.
I can reproduce. Investigating...
I think I know what the problem is. The proxy has a bug that may cause it to fail to send a packet when an internal heartbeat occurs between a read from the client and a write to the server. 😬
Thanks for digging in @karolz-ms! Thanks for poking at this @afscrome !
The fix has been merged into DCP main branch and it is on its way into first Aspire service release. Thanks again to @afscrome for spotting the issue and a very good repro; it helped a ton.