eventuous icon indicating copy to clipboard operation
eventuous copied to clipboard

Transient error drops the subscription. (Connection reset by peer)

Open PehrGit opened this issue 1 year ago • 6 comments

Describe the bug We noticed that a subscription had stopped processing. We discovered that it was due to a SqlException with inner exception SocketException with message:

"A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 35 - An internal exception was caught) Unable to read data from the transport connection: Connection reset by peer. Connection reset by peer".

I believe this should be recognized as a transient error and retried, like the other error numbers listed in https://github.com/Eventuous/eventuous/blob/b7352bb3b6565dd974b74a35655d782cea08dc08/src/SqlServer/src/Eventuous.SqlServer/Subscriptions/SqlServerSubscriptionBase.cs#L61

To Reproduce Steps to reproduce the behavior:

  • Have SQL Server subscription
  • Have server close the connection

Expected behavior The error is recognized as transient and the message is retried.

Screenshots N/A

Desktop (please complete the following information):

  • OS: Linux Azure App Service
  • Eventuous version 0.15.0-beta.7

Additional context There is no additional logging because this didn't occur during the processing of a message, it was in the middle of the night and nobody was using the system. So we assume it was just a hiccup on the Azure side.

Full stack trace:

Microsoft.Data.SqlClient.SqlException:
   at Microsoft.Data.SqlClient.SqlCommand.EndExecuteReaderAsync (Microsoft.Data.SqlClient, Version=5.0.0.0, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5)
   at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Eventuous.SqlServer.Subscriptions.SqlServerSubscriptionBase`1+<PollingQuery>d__15.MoveNext (Eventuous.SqlServer, Version=0.15.0.0, Culture=neutral, PublicKeyToken=null)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Eventuous.SqlServer.Subscriptions.SqlServerSubscriptionBase`1+<PollingQuery>d__15.MoveNext (Eventuous.SqlServer, Version=0.15.0.0, Culture=neutral, PublicKeyToken=null)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Eventuous.SqlServer.Subscriptions.SqlServerSubscriptionBase`1+<PollingQuery>d__15.MoveNext (Eventuous.SqlServer, Version=0.15.0.0, Culture=neutral, PublicKeyToken=null)
Inner exception System.IO.IOException handled at Microsoft.Data.SqlClient.SqlCommand.EndExecuteReaderAsync:
   at System.Net.Sockets.Socket+AwaitableSocketAsyncEventArgs.ThrowException (System.Net.Sockets, Version=6.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
   at System.Net.Sockets.Socket+AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource<System.Int32>.GetResult (System.Net.Sockets, Version=6.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
   at Microsoft.Data.SqlClient.SNI.SNINetworkStream+<ReadAsync>d__1.MoveNext (Microsoft.Data.SqlClient, Version=5.0.0.0, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Data.SqlClient.SNI.SslOverTdsStream+<ReadAsync>d__5.MoveNext (Microsoft.Data.SqlClient, Version=5.0.0.0, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Net.Security.SslStream+<EnsureFullTlsFrameAsync>d__186`1.MoveNext (System.Net.Security, Version=6.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Net.Security.SslStream+<ReadAsyncInternal>d__188`1.MoveNext (System.Net.Security, Version=6.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Data.SqlClient.SNI.SNISslStream+<ReadAsync>d__1.MoveNext (Microsoft.Data.SqlClient, Version=5.0.0.0, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5)
Inner exception System.Net.Sockets.SocketException handled at System.Net.Sockets.Socket+AwaitableSocketAsyncEventArgs.ThrowException:

PehrGit avatar Jan 23 '24 12:01 PehrGit

I will accept the PR if you add this error to the list you mentioned. I am not sure that SQL error number will be 35 though.

alexeyzimarev avatar Jan 23 '24 20:01 alexeyzimarev

Look also here https://github.com/dotnet/SqlClient/issues/2103#issuecomment-1764206103, it seems that on Windows it will produce 10053, but on Linux it's impossible to figure out.

alexeyzimarev avatar Jan 23 '24 20:01 alexeyzimarev

Strangely enough, the only thing that should have happened is that the subscription would drop and resubscribe. Can you confirm that the subscription just silently died? Do you have any health checks set up using the provided diagnostics methods? https://eventuous.dev/docs/subscriptions/subs-diagnostics/#health-checks

alexeyzimarev avatar Jan 23 '24 20:01 alexeyzimarev

Strangely enough, the only thing that should have happened is that the subscription would drop and resubscribe. Can you confirm that the subscription just silently died? Do you have any health checks set up using the provided diagnostics methods? https://eventuous.dev/docs/subscriptions/subs-diagnostics/#health-checks

I've tested again by running the app locally and stopping the SqlServer instance. I see the "Dropped" message in the logs but it doesn't resubscribe, and the health check keeps outputting "Healthy".

It makes sense, as the Resubscribe() method is only called from Dropped(), which is not called when the polling connection fails. https://github.com/Eventuous/eventuous/blob/b7352bb3b6565dd974b74a35655d782cea08dc08/src/SqlServer/src/Eventuous.SqlServer/Subscriptions/SqlServerSubscriptionBase.cs#L63-L96

It is only called from HandleInternal, and only in the case of an OperationCanceledException https://github.com/Eventuous/eventuous/blob/b7352bb3b6565dd974b74a35655d782cea08dc08/src/Core/src/Eventuous.Subscriptions/EventSubscriptionWithCheckpoint.cs#L41-L54

PehrGit avatar Jan 24 '24 15:01 PehrGit

Look also here dotnet/SqlClient#2103 (comment), it seems that on Windows it will produce 10053, but on Linux it's impossible to figure out.

Ah that's too bad. Thanks for looking into that!

I suppose we should focus on getting the SQL subscription to resubscribe when the connection drops, that should also fix this issue?

PehrGit avatar Jan 24 '24 15:01 PehrGit

Looking at the ESDB AllStreamSubscription, I see that EventSubscription.Dropped() is called when the subscription drops.

Could it be that we just need to replace IsDropped = true; with a call to .Dropped() in this method?

Edit: it looks like this is already fixed in dev, where Dropped(DropReason.ServerError, e); is called in the catch: https://github.com/Eventuous/eventuous/blob/0be16566922589befc985e61f750ef88c071641c/src/Relational/src/Eventuous.Sql.Base/Subscriptions/SqlSubscriptionBase.cs#L51-L83

PehrGit avatar Jan 24 '24 15:01 PehrGit