extensions icon indicating copy to clipboard operation
extensions copied to clipboard

Hedged Requests are not fired after connection timeouts

Open sandrohanea opened this issue 2 years ago • 4 comments

Description

If a connection establishment is failing with a timeout, configured on the SocketsHttpHandler, hedged requests are not generated and the whole pipeline fails.

Reproduction Steps

PreReq:

  • Have some standard hedging handler with 2 endpoint groups: [ "https://localhost:3000", "https://microsoft.com" ]
  • Have total request timeout and endpoint timeout long enough to not be reached (e.g. 100 seconds)
  • Have a short hedging delay.
  • Configure a SocketsHttpHandler with ConnectTimeout short enough to be reached for the first endpoint group (e.g. 1 second)
  • Fire a request with the newly created HttpClient

Expected behavior

The expected behavior is that "https://microsoft.com" endpoint is called and the pipeline succeeds.

Actual behavior

It is not actually called and a TaskCanceledException is thrown (coming from the HttpConnectionPool)

Task Canceled Exception
System.Threading.Tasks.TaskCanceledException: The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.
 ---> System.TimeoutException: The operation was canceled.
 ---> System.Threading.Tasks.TaskCanceledException: The operation was canceled.
 ---> System.TimeoutException: A connection could not be established within the configured ConnectTimeout.
   --- End of inner exception stack trace ---
   at System.Net.Http.HttpConnectionPool.CreateConnectTimeoutException(OperationCanceledException oce)
   at System.Net.Http.HttpConnectionPool.AddHttp11ConnectionAsync(HttpRequestMessage request)

Regression?

No response

Known Workarounds

No response

Configuration

No response

Other information

No response

sandrohanea avatar Nov 02 '23 16:11 sandrohanea

This code reproduces the issue:

using System;
using System.Net.Http;
using System.Net.Security;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Http.Resilience;

var services = new ServiceCollection();
var clientBuilder = services
    .AddHttpClient("ClientR9")
    .ConfigurePrimaryHttpMessageHandler(() => new SocketsHttpHandler
    {
        UseProxy = false,
        Proxy = null,
        ConnectTimeout = TimeSpan.FromSeconds(1),
        SslOptions = new SslClientAuthenticationOptions
        {
            ClientCertificates = [],
            RemoteCertificateValidationCallback = (_, _, _, _) => true
        },
    })
    .AddStandardHedgingHandler(routing =>
    {
        routing.ConfigureOrderedGroups(g =>
        {
            g.Groups.Add(new UriEndpointGroup
            {
                Endpoints = new[] { new WeightedUriEndpoint { Uri = new Uri("https://localhost:3000") } }
            });

            g.Groups.Add(new UriEndpointGroup
            {
                Endpoints = new[] { new WeightedUriEndpoint { Uri = new Uri("https://microsoft.com") } }
            });
        });

    })
    .Configure(opt =>
    {
        opt.Hedging.MaxHedgedAttempts = 10;
        opt.Hedging.Delay = TimeSpan.FromSeconds(11);
        opt.Endpoint.CircuitBreaker.FailureRatio = 0.99;
        opt.Endpoint.CircuitBreaker.SamplingDuration = TimeSpan.FromSeconds(900);
        opt.TotalRequestTimeout.Timeout = TimeSpan.FromSeconds(200);
        opt.Endpoint.Timeout.Timeout = TimeSpan.FromSeconds(200);
    });

var provider = services.BuildServiceProvider();
var clientFactory = provider.GetRequiredService<IHttpClientFactory>();
var client = clientFactory.CreateClient("ClientR9");

try
{
    var response = await client.GetAsync("https://localhost:3000");
    Console.WriteLine(response.StatusCode);
}
catch (OperationCanceledException ex)
{
    Console.WriteLine("Should never reach here but it does after 1 second");
}

The issue is that OperationCanceledException thrown from SocketsHttpHandler is not threated as transient. The challange is to detect this particular case and not treat other OperationCanceledException as transient.

martintmk avatar Nov 02 '23 16:11 martintmk

This is where the exception is thrown and unfortunately there is no consistent way in identifying if it was a task cancelled from the connection timeout or task cancelled from a request timeout, in which case we probably don't want to retry / execute hedged requests:

https://github.com/dotnet/runtime/blob/deb43368fde438b3e6928d3930f1362a4960fdf4/src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnectionPool.cs#L475

sandrohanea avatar Nov 08 '23 14:11 sandrohanea

How about checking against ResilienceContext.CancellationToken? If ResilienceContext.CancellationToken is not cancelled and we receive TaskCanceledException with that layout we know that this cancellation came from SocketsHttpHandler when establishing connection.

martintmk avatar Nov 08 '23 14:11 martintmk

I think that might work, what about the CancellationToken that is provided to the SendAsync method? Is that used initially to create a linked CancellationToken for ResilienceContext.CancellationToken ?

sandrohanea avatar Nov 08 '23 14:11 sandrohanea