Polly
Polly copied to clipboard
Request Info : Retry policy with IHttpClientFactory calling from Azure App Service hosted API
I was hoping you could help me understand a bit more the inner working of the Polly retry policy. I have setup a WaitAndRetry policy using IHttpClientFactory, and have added my own timeout DelegatingHandler afterward. It's a basic implementation that creates a linked token source with my own timeout, and throws a TimeoutException.
.AddTransientHttpErrorPolicy(p =>
p.Or<SocketException>()
.Or<TimeoutException>()
.OrResult(r => r.StatusCode == HttpStatusCode.NotAcceptable) // retry on 406
.WaitAndRetryAsync(3, attempt => TimeSpan.FromMilliseconds(attempt * 500),
PollyExtensions.LoggingRetryDelegateHandler)) // my own logging handler
.AddHttpMessageHandler<TimeoutHandler>(); // my own delegating handler
So here's where it gets interesting... this policy is part of an Azure App Service hosted API which calls out to a 3rd party API. This API misbehaves often. The instance of particular interest is a call which can take several minutes to complete when it is not working correctly (should take ~5s normally). My default timeout is 100 seconds, so the expected failure result is 3 retries and then the 500 result + exception to bubble up after around 300 total seconds.
But what I'm seeing is the first 2 retries (~200 sec), and then 30 sec into the 3rd retry I get a TaskCanceledException. That's a total of ~230 sec.
I found this which explains the 230 sec : https://stackoverflow.com/a/38676086
I gather from reading a few posts and my experience that the 230 sec limit applies to both inbound and outbound HTTP requests.
So the question is... why / how is the 230 sec limit being applied to the request which is retried by Polly after 100 sec? I would expect each request gets a fresh Azure limit. One guess is that under the covers the http client is reusing the initial connection or socket, and the Azure platform somehow recognizes the retries as the same connection?
Another option would be to use a Polly timeout policy, but I believe that is implemented in much the same way, a DelegatingHandler with a cancellable token source.
Any thoughts or insight into what might be happening with the http client / request? I know this isn't specifically related to Polly, but you guys have a lot of experience with the inner workings, so shot in the dark.
Thank you!