Polly.Contrib.WaitAndRetry
Polly.Contrib.WaitAndRetry copied to clipboard
Currently Remora.Discord uses WaitAndRetry for Discord based sockets, however frequently we get issues of the sockets going into invalid states.
Summary: Currently when using Discord Bots sometimes the sockets get into an invalid state, however it is most of the time frequently in about every few hours.
Expected behavior:
For the socket to not go into invalid state ('Aborted') and instead go into CloseSent so that way WebSocketException is not thrown.
Actual behaviour:
Transient error in gateway client: The WebSocket is in an invalid state ('Aborted') for this operation. Valid states are: 'Open, CloseSent'
System.Net.WebSockets.WebSocketException (0x80004005): The WebSocket is in an invalid state ('Aborted') for this operation. Valid states are: 'Open, CloseSent'
at System.Net.WebSockets.WebSocketValidate.ThrowIfInvalidState(WebSocketState currentState, Boolean isDisposed, WebSocketState[] validStates)
at System.Net.WebSockets.ManagedWebSocket.ReceiveAsync(ArraySegment`1 buffer, CancellationToken cancellationToken)
--- End of stack trace from previous location ---
at Polly.Retry.AsyncRetryEngine.ImplementationAsync[TResult](Func`3 action, Context context, CancellationToken cancellationToken, ExceptionPredicates shouldRetryExceptionPredicates, ResultPredicates`1 shouldRetryResultPredicates, Func`5 onRetryAsync, Int32 permittedRetryCount, IEnumerable`1 sleepDurationsEnumerable, Func`4 sleepDurationProvider, Boolean continueOnCapturedContext)
at Polly.AsyncPolicy`1.ExecuteAsync(Func`3 action, Context context, CancellationToken cancellationToken, Boolean continueOnCapturedContext)
at Remora.Discord.Gateway.Transport.WebSocketPayloadTransportService.ReceivePayloadAsync(CancellationToken ct)
at Remora.Discord.Gateway.Transport.WebSocketPayloadTransportService.ReceivePayloadAsync(CancellationToken ct)
at Remora.Discord.Gateway.DiscordGatewayClient.GatewayReceiverAsync(CancellationToken disconnectRequested)
Steps / Code to reproduce the problem:
I sadly personally never used WaitAndRetry myself but I think Asking @Nihlus for example code that can reproduce it can help narrow down the cause.
"Aborted" means the connection was closed for some external reason (network blip, serverside crash, gremlins, cosmic rays etc.). It is an unavoidable fact of network communication and not something Remora nor Polly can fix.