marten
marten copied to clipboard
"Connection is not open" / "Exception while reading from stream"
We are getting a handful of instances of the exception "Connection is not open" / "Exception while reading from stream" in our production systems, but I'm not sure of exactly what is going on.
This is the first exception we see logged, the stack trace stops at Npgsql.NpgsqlCommand.ExecuteReader
.
Npgsql.NpgsqlException (0x80004005): Exception while reading from stream
---> System.TimeoutException: Timeout during reading attempt
at Npgsql.Internal.NpgsqlReadBuffer.<Ensure>g__EnsureLong|41_0(NpgsqlReadBuffer buffer, Int32 count, Boolean async, Boolean readingNotifications)
at Npgsql.Internal.NpgsqlConnector.<ReadMessage>g__ReadMessageLong|211_0(NpgsqlConnector connector, Boolean async, DataRowLoadingMode dataRowLoadingMode, Boolean readingNotifications, Boolean isReadingPrependedMessage)
at Npgsql.Internal.NpgsqlConnector.<ReadMessage>g__ReadMessageLong|211_0(NpgsqlConnector connector, Boolean async, DataRowLoadingMode dataRowLoadingMode, Boolean readingNotifications, Boolean isReadingPrependedMessage)
at Npgsql.NpgsqlDataReader.NextResult(Boolean async, Boolean isConsuming, CancellationToken cancellationToken)
at Npgsql.NpgsqlCommand.ExecuteReader(CommandBehavior behavior, Boolean async, CancellationToken cancellationToken)
The second is below
System.InvalidOperationException: Connection is not open
at Npgsql.NpgsqlCommand.ExecuteReader(CommandBehavior behavior, Boolean async, CancellationToken cancellationToken)
at Marten.DefaultRetryPolicy.TryAsync[T](Func`1 operation, CancellationToken token)
at Marten.Internal.Sessions.QuerySession.ExecuteReaderAsync(NpgsqlCommand command, CancellationToken token)
at Baseline.Exceptions.ExceptionTransformExtensions.TransformAndThrow(IEnumerable`1 transforms, Exception ex)
at Baseline.Exceptions.ExceptionTransforms.TransformAndThrow(Exception ex)
at Marten.Exceptions.MartenExceptionTransformer.WrapAndThrow(NpgsqlCommand command, Exception exception)
at Marten.Internal.Sessions.QuerySession.handleCommandException(NpgsqlCommand cmd, Exception e)
at Marten.Internal.Sessions.QuerySession.ExecuteReaderAsync(NpgsqlCommand command, CancellationToken token)
at Marten.Linq.MartenLinqQueryProvider.ExecuteHandlerAsync[T](IQueryHandler`1 handler, CancellationToken token)
at Marten.Linq.MartenLinqQueryProvider.ExecuteAsync[TResult](Expression expression, CancellationToken token, ResultOperatorBase op)
at [App stack]
The code triggering this issue looks like this:
public async Task<TenantResource> InvokeAsync(IDocumentSession db)
{
var tenant = await db.Query<Tenant>()
.Where(t => t.ApiId.Equals(this.TenantId!.Value))
.SingleOrDefaultAsync();
return Mapper.MapFrom(tenant);
}
The IDocumentSession
is injected from IoC. We use the default retry policy and the connection is managed by Marten (by passing in a connection string in the StoreOptions
.
I'm not sure if the 2nd exception is hiding the first / not handling retry fully correctly? It appears the timeout is being hit (default 30s) and that is triggering a Connection is not open
exception.
I've not reproduced locally yet.
We are using v5.0.0 of Marten
@barclayadam I'm prepared to be very wrong, but that looks like transient network errors to me. It's very possible that the retry policy is knocking out the connection on the 2nd try though. Might be worthwhile to try Polly instead and use a little bit of a backoff policy. I'll peek at the mechanics sometime to see if there's anything obvious about the connection mechanics on retries.
@jeremydmiller Based on the frequency and investigation I think you are right that the root cause is transient, I think the issue here is the response to that transient exception.
We moved from v3 with a Polly retry handler to v5 and the built-in retry policy in an attempt to get rid of this issue, and although it seems like the frequency has dropped the same issue continues.
I think this may have to wait until Marten 7. I think we ditch the lambda centric retry approach as that's inefficient. And the retry loop probably needs to be moved up quite a bit in the stack such that you can retry the whole "open connection, retry command" rather than trying to retry just NpgsqlCommand.ExecuteReaderAsync()
and the like
From notes:
- What's existing doesn't completely work because we don't reset the connection, so the retries just fail over and over again
- Look at what's built into NpgsqlDataSource
- Defaults, guidance, configuration for whether or not it would be okay for the session to reset the connection
- Guidance around error handling, recommend exponential backoff
- Polly add on for Marten???
- More documentation about using Polly and error handling advice
Rolling this into a bigger issue #2887