Hedged Requests are not fired after connection timeouts
Description
If a connection establishment is failing with a timeout, configured on the SocketsHttpHandler, hedged requests are not generated and the whole pipeline fails.
Reproduction Steps
PreReq:
- Have some standard hedging handler with 2 endpoint groups: [ "https://localhost:3000", "https://microsoft.com" ]
- Have total request timeout and endpoint timeout long enough to not be reached (e.g. 100 seconds)
- Have a short hedging delay.
- Configure a SocketsHttpHandler with ConnectTimeout short enough to be reached for the first endpoint group (e.g. 1 second)
- Fire a request with the newly created HttpClient
Expected behavior
The expected behavior is that "https://microsoft.com" endpoint is called and the pipeline succeeds.
Actual behavior
It is not actually called and a TaskCanceledException is thrown (coming from the HttpConnectionPool)
Task Canceled Exception
System.Threading.Tasks.TaskCanceledException: The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.
---> System.TimeoutException: The operation was canceled.
---> System.Threading.Tasks.TaskCanceledException: The operation was canceled.
---> System.TimeoutException: A connection could not be established within the configured ConnectTimeout.
--- End of inner exception stack trace ---
at System.Net.Http.HttpConnectionPool.CreateConnectTimeoutException(OperationCanceledException oce)
at System.Net.Http.HttpConnectionPool.AddHttp11ConnectionAsync(HttpRequestMessage request)
Regression?
No response
Known Workarounds
No response
Configuration
No response
Other information
No response
This code reproduces the issue:
using System;
using System.Net.Http;
using System.Net.Security;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Http.Resilience;
var services = new ServiceCollection();
var clientBuilder = services
.AddHttpClient("ClientR9")
.ConfigurePrimaryHttpMessageHandler(() => new SocketsHttpHandler
{
UseProxy = false,
Proxy = null,
ConnectTimeout = TimeSpan.FromSeconds(1),
SslOptions = new SslClientAuthenticationOptions
{
ClientCertificates = [],
RemoteCertificateValidationCallback = (_, _, _, _) => true
},
})
.AddStandardHedgingHandler(routing =>
{
routing.ConfigureOrderedGroups(g =>
{
g.Groups.Add(new UriEndpointGroup
{
Endpoints = new[] { new WeightedUriEndpoint { Uri = new Uri("https://localhost:3000") } }
});
g.Groups.Add(new UriEndpointGroup
{
Endpoints = new[] { new WeightedUriEndpoint { Uri = new Uri("https://microsoft.com") } }
});
});
})
.Configure(opt =>
{
opt.Hedging.MaxHedgedAttempts = 10;
opt.Hedging.Delay = TimeSpan.FromSeconds(11);
opt.Endpoint.CircuitBreaker.FailureRatio = 0.99;
opt.Endpoint.CircuitBreaker.SamplingDuration = TimeSpan.FromSeconds(900);
opt.TotalRequestTimeout.Timeout = TimeSpan.FromSeconds(200);
opt.Endpoint.Timeout.Timeout = TimeSpan.FromSeconds(200);
});
var provider = services.BuildServiceProvider();
var clientFactory = provider.GetRequiredService<IHttpClientFactory>();
var client = clientFactory.CreateClient("ClientR9");
try
{
var response = await client.GetAsync("https://localhost:3000");
Console.WriteLine(response.StatusCode);
}
catch (OperationCanceledException ex)
{
Console.WriteLine("Should never reach here but it does after 1 second");
}
The issue is that OperationCanceledException thrown from SocketsHttpHandler is not threated as transient. The challange is to detect this particular case and not treat other OperationCanceledException as transient.
This is where the exception is thrown and unfortunately there is no consistent way in identifying if it was a task cancelled from the connection timeout or task cancelled from a request timeout, in which case we probably don't want to retry / execute hedged requests:
https://github.com/dotnet/runtime/blob/deb43368fde438b3e6928d3930f1362a4960fdf4/src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnectionPool.cs#L475
How about checking against ResilienceContext.CancellationToken? If ResilienceContext.CancellationToken is not cancelled and we receive TaskCanceledException with that layout we know that this cancellation came from SocketsHttpHandler when establishing connection.
I think that might work, what about the CancellationToken that is provided to the SendAsync method?
Is that used initially to create a linked CancellationToken for ResilienceContext.CancellationToken ?