runtime icon indicating copy to clipboard operation
runtime copied to clipboard

HttpConnectionPool violates HttpHeaders thread-safety for CONNECT tunnels

Open MihaZupan opened this issue 2 years ago • 20 comments

In .NET 6, the creation of a connection has been decoupled from the initiating request. That is, a connection attempt may still be ongoing even after the initiating request was canceled or was served by a different connection that became available.

In the case of HttpConnectionKind.SslProxyTunnel (CONNECT tunnel), we copy the original request's User-Agent header to the tunnel request here. The problem is that we access the header too late. At that point, the original request's header collection could be enumerated on a different thread (served by a different connection). Or, the SendAsync call associated with the request already returned, seemingly returning ownership back to the user. The user may then attempt to enumerate/inspect the headers, which would again race with the read from EstablishProxyTunnelAsync.

This is a possible explanation for an issue a user hit in https://github.com/dotnet/runtime/issues/61798#issuecomment-1040227700 based on the stack trace.

cc: @Strepto

This specific issue can be worked around by manually forcing the parsing of the User-Agent header:

var socketsHttpHandler = new SocketsHttpHandler();
var customHandler = new CustomHandler(socketsHttpHandler);
var httpClient = new HttpClient(customHandler);

public sealed class CustomHandler : DelegatingHandler
{
    public CustomHandler(HttpMessageHandler innerHandler) : base(innerHandler) { }

    protected override HttpResponseMessage Send(HttpRequestMessage request, CancellationToken cancellationToken)
    {
        _ = request.Headers.TryGetValues("User-Agent", out _); // Force parsing

        return base.Send(request, cancellationToken);
    }

    protected override Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
    {
        _ = request.Headers.TryGetValues("User-Agent", out _); // Force parsing

        return base.SendAsync(request, cancellationToken);
    }
}

MihaZupan avatar Feb 15 '22 15:02 MihaZupan

Tagging subscribers to this area: @dotnet/ncl See info in area-owners.md if you want to be subscribed.

Issue Details

In .NET 6, the creation of a connection has been decoupled from the initiating request. That is, a connection attempt may still be ongoing even after the initiating request was canceled or was served by a different connection that became available.

In the case of HttpConnectionKind.SslProxyTunnel (CONNECT tunnel), we copy the original request's User-Agent header to the tunnel request here. The problem is that we access the header too late. At that point, the original request's header collection could be enumerated on a different thread (served by a different connection). Or, the SendAsync call associated with the request already returned, seemingly returning ownership back to the user. The user may then attempt to enumerate/inspect the headers, which would again race with the read from EstablishProxyTunnelAsync.

This is a possible explanation for an issue a user hit in https://github.com/dotnet/runtime/issues/61798#issuecomment-1040227700 based on the stack trace.

cc: @Strepto

Author: MihaZupan
Assignees: -
Labels:

bug, area-System.Net.Http

Milestone: -

msftbot[bot] avatar Feb 15 '22 15:02 msftbot[bot]

Presumably the fix here is to make sure we capture the User-Agent from the original request before we actually put the request in the queue (and make it available for other connections to grab), is that right?

geoffkizer avatar Feb 15 '22 17:02 geoffkizer

Yes, that's right.

MihaZupan avatar Feb 15 '22 17:02 MihaZupan

Somewhat related here: We are forcing the User-Agent to be parsed here because we use TryGetValues. We should avoid this by using NonValidated here.

(I thought there was an issue on this but I couldn't find it...)

geoffkizer avatar Feb 15 '22 17:02 geoffkizer

Triage: Impact on reliability with proxies, we should address it. Hopefully not too complex.

karelz avatar Feb 17 '22 17:02 karelz

This has been mitigated by #68115 in 7.0 - you should only hit this issue if the request message's headers are modified after it was sent.

Moving to future.

MihaZupan avatar Jul 14 '22 13:07 MihaZupan

What's the workaround in NET 6? The workaround with CustomHandler below did not work:

    public sealed class CustomHandler : DelegatingHandler
    {
        public CustomHandler(HttpMessageHandler innerHandler) : base(innerHandler)
        {
        }

        protected override Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
        {
            _ = request.Headers.TryGetValues("User-Agent", out _); // Force parsing

            return base.SendAsync(request, cancellationToken);
        }
    }

We are having the IndexOutOfRangeException as you mentioned in your original post: https://github.com/dotnet/runtime/issues/61798#issuecomment-1040227700 After spending 2 days as a team trying to solve it, we wrapped it all in try catch and retrying...

Edgaras91 avatar Jul 26 '22 15:07 Edgaras91

@Edgaras91 can you please post the stack trace where you are seeing these exceptions?

Are you setting SocketsHttpHandler.ConnectCallback? Are you reusing and/or modifying the HttpRequestMessage after you send it?

The workaround should be 100% effective as long as you answer "no" to the above.

MihaZupan avatar Jul 26 '22 15:07 MihaZupan

Stack trace:

System.IndexOutOfRangeException: Index was outside the bounds of the array.
   at System.Net.Http.Headers.HttpHeaders.ReadStoreValues[T](Span`1 values, Object storeValue, HttpHeaderParser parser, Int32& currentIndex)
   at System.Net.Http.Headers.HttpHeaders.GetStoreValuesAsStringOrStringArray(HeaderDescriptor descriptor, Object sourceValues, String& singleValue, String[]& multiValue)
   at System.Net.Http.Headers.HttpHeaders.GetEnumeratorCore()+MoveNext()
   at System.Linq.Enumerable.SelectManySingleSelectorIterator`2.MoveNext()
   at System.Linq.Enumerable.SelectEnumerableIterator`2.ToList()
   at RestSharp.RestResponse.<>c__DisplayClass0_0.<<FromHttpResponse>g__GetDefaultResponse|0>d.MoveNext()

No ConnectCallback, not re-using HttpRequestMessage. We are however using a 3rd party wrapper "RestSharp", but your workaround is hitting the breakpoint at Headers.TryGetValues.

I will search if this "RestSharp" is doing any of the 2 things you mentioned.

Edgaras91 avatar Jul 26 '22 16:07 Edgaras91

but your workaround is hitting the breakpoint at Headers.TryGetValues.

What do you mean by this?

You are definitely hitting #61798, but if the workaround does not work 100%, you are hitting something other than this specific issue (#65379).

Are you using NewRelic? Using an outdated version (either as a direct dependency, or running on the host machine) was a common cause we saw for #61798.

MihaZupan avatar Jul 26 '22 16:07 MihaZupan

I am not using NewRelic. The workaround is not working.

By hitting breakpoint I meant I did implement a workaround successfully - but it doesn't work.

Because there is no other solution for this, please re-consider adding related fixes in NET 6. Our solutions cant make HTTP calls reliably.

It may be worth allowing posting comments on the other issues (https://github.com/dotnet/runtime/issues/61798) as there may be more people experiencing this.

Edgaras91 avatar Jul 26 '22 17:07 Edgaras91

I can also confirm that the RestSharp 107.3.0 (what I'm using) does create new HttpRequestMessage every time and there are no signs of SocketsHttpHandler.ConnectCallback in the source code.

So seems like it just "doesn't work".

In your example, is below line mandatory, because we didn't have it in our attempt to use the workaround: var socketsHttpHandler = new SocketsHttpHandler();

I can say, that we are only experiencing this issue on a server (release build), and not on local machines. Tried in debug and release builds. Is this linked with IIS somehow?

Edgaras91 avatar Jul 27 '22 08:07 Edgaras91

@Edgaras91 can you please open a new issue and share how you are making HTTP calls? Is this only occurring when using a proxy? If you have a reproducible example that would also help a ton to diagnose the issue. IIS does not impact HttpClient's behavior.

Seeing this sort of exception means multiple threads are accessing the header collection concurrently. There are a few known cases (NewRelic, CONNECT tunnels ...), but it's possible something else is happening in your scenario.

MihaZupan avatar Jul 27 '22 22:07 MihaZupan

Will this fix be backported to .NET 6?

madelson avatar Jul 28 '22 20:07 madelson

@madelson can you please give us some numbers on how often it happens to you? How much it impacts your service? Potentially any business impact on your company or your customers. We will need the info to justify backport into 6.0.

karelz avatar Aug 02 '22 11:08 karelz

Triage: Given that we might need to backport the fix to 6.0 (per discussion above - pending impact details), we should fix it in 7.0 (although the impact on 7.0 is much lower than on 6.0) to have the fix ready for potential backport to 6.0.x and to avoid servicing 7.0 if we decide in future to service 6.0.

karelz avatar Aug 02 '22 11:08 karelz

@madelson can you also please confirm whether the workaround mentioned in the top issue (accessing the User-Agent header before sending the request) fully resolves this issue for you (we expect it should)?

MihaZupan avatar Aug 02 '22 12:08 MihaZupan

@MihaZupan thanks for following up. We've deployed the workaround and are actively monitoring. The error was transient, so we want to wait a week or so before declaring victory.

madelson avatar Aug 03 '22 12:08 madelson

@MihaZupan I still have to do some more digging to confirm early next week, but so far it looks like the workaround may not have fixed the issue for us.

madelson avatar Aug 06 '22 22:08 madelson

Can you share how you implemented the workaround? We would expect that it is a 100% reliable fix for this specific issue.

If it's not, it's possible you are running into a different header issue (e.g. using outdated NewRelic) that results in similar issues. Notably, you shouldn't see HttpConnectionPool.EstablishProxyTunnelAsync in the stack trace at all after the workaround.

MihaZupan avatar Aug 07 '22 13:08 MihaZupan

@MihaZupan ok I think we have a false alarm; the error was in an environment that hadn't received the fix yet. I'll continue to monitor.

Here's what our implementation of the workaround looks like:

    var transport = new HttpClientTransport(new UserAgentAccessingHandler(new HttpClientHandler
    {
        Proxy = webProxy,
        MaxConnectionsPerServer = this._maxConnectionsPerServer.Value
    }));
    return new BlobServiceClient(uri, options: new() { Transport = transport });

#if NET6_0 // this will remind us to revisit this if we upgrade our .NET version!
    /// <summary>
    /// Workaround for .NET 6 concurrency bug (see https://github.com/dotnet/runtime/issues/65379).
    /// </summary>
    private sealed class UserAgentAccessingHandler : DelegatingHandler
    {
        private const string UserAgentHeader = "User-Agent";

        public UserAgentAccessingHandler(HttpMessageHandler innerHandler) : base(innerHandler) { }

        protected override HttpResponseMessage Send(HttpRequestMessage request, CancellationToken cancellationToken)
        {
            _ = request.Headers.TryGetValues(UserAgentHeader, out _); // Force parsing

            return base.Send(request, cancellationToken);
        }

        protected override Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
        {
            _ = request.Headers.TryGetValues(UserAgentHeader, out _); // Force parsing

            return base.SendAsync(request, cancellationToken);
        }
    }
#endif

I also just want to confirm that this is a flavor of the error that should be fixed here:

System.IndexOutOfRangeException: Index was outside the bounds of the array. 
at System.Net.Http.Headers.HttpHeaders.ReadStoreValues[T](Span`1 values, Object storeValue, HttpHeaderParser parser, Int32& currentIndex) 
at System.Net.Http.Headers.HttpHeaders.GetStoreValuesAsStringOrStringArray(HeaderDescriptor descriptor, Object sourceValues, String& singleValue, String[]& multiValue) 
at System.Net.Http.Headers.HttpHeaders.GetStoreValuesAsStringArray(HeaderDescriptor descriptor, HeaderStoreItemInfo info) at System.Net.Http.HttpConnectionPool.EstablishProxyTunnelAsync(Boolean async, HttpRequestHeaders headers, CancellationToken cancellationToken) 
at System.Net.Http.HttpConnectionPool.ConnectAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken) at System.Net.Http.HttpConnectionPool.CreateHttp11ConnectionAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken) 
at System.Net.Http.HttpConnectionPool.AddHttp11ConnectionAsync(HttpRequestMessage request) 
at System.Threading.Tasks.TaskCompletionSourceWithCancellation`1.WaitWithCancellationAsync(CancellationToken cancellationToken) 
at System.Net.Http.HttpConnectionPool.GetHttp11ConnectionAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken) 
at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken) 
at System.Net.Http.DiagnosticsHandler.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken) 
at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken) 
at System.Net.Http.HttpClient.g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken) 
at Azure.Core.Pipeline.HttpClientTransport.ProcessAsync(HttpMessage message, Boolean async) 
at Azure.Core.Pipeline.HttpPipelineTransportPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline) 
at Azure.Core.Pipeline.ResponseBodyPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline, Boolean async) 
at Azure.Core.Pipeline.HttpPipelineSynchronousPolicy.g__ProcessAsyncInner|4_0(HttpMessage message, ReadOnlyMemory`1 pipeline) 
at Azure.Core.Pipeline.RedirectPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline, Boolean async) 
at Azure.Core.Pipeline.RetryPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline, Boolean async) 
at Azure.Core.Pipeline.RetryPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline, Boolean async) 
at Azure.Storage.Blobs.BlobRestClient.DownloadAsync(String snapshot, String versionId, Nullable`1 timeout, String range, String leaseId, Nullable`1 rangeGetContentMD5, Nullable`1 rangeGetContentCRC64, String encryptionKey, String encryptionKeySha256, Nullable`1 encryptionAlgorithm, Nullable`1 ifModifiedSince, Nullable`1 ifUnmodifiedSince, String ifMatch, String ifNoneMatch, String ifTags, CancellationToken cancellationToken) 
at Azure.Storage.Blobs.Specialized.BlobBaseClient.StartDownloadAsync(HttpRange range, BlobRequestConditions conditions, Boolean rangeGetContentHash, Int64 startOffset, Boolean async, CancellationToken cancellationToken) 
at Azure.Storage.Blobs.Specialized.BlobBaseClient.DownloadStreamingInternal(HttpRange range, BlobRequestConditions conditions, Boolean rangeGetContentHash, IProgress`1 progressHandler, String operationName, Boolean async, CancellationToken cancellationToken) 
at Azure.Storage.Blobs.Specialized.BlobBaseClient.DownloadInternal(HttpRange range, BlobRequestConditions conditions, Boolean rangeGetContentHash, Boolean async, CancellationToken cancellationToken) 
at Azure.Storage.Blobs.Specialized.BlobBaseClient.DownloadAsync(HttpRange range, BlobRequestConditions conditions, Boolean rangeGetContentHash, CancellationToken cancellationToken) 
at Azure.Storage.Blobs.Specialized.BlobBaseClient.DownloadAsync(CancellationToken cancellationToken) 
at Apt.Platform.DataImportWorker.Web.ImportExecution.Executors.AzureBlobPullExecutor.DownloadFileAsync(BlobContainerClient containerClient, String blobName, String localDirectory, Boolean overwriteExistingFiles, Boolean failOnFileExistFlag, Boolean deleteFilesFromRemoteContainer, CancellationToken cancellationToken)

madelson avatar Aug 08 '22 14:08 madelson

The workaround looks good. And yes, that stack trace should be the same issue.

MihaZupan avatar Aug 08 '22 15:08 MihaZupan

EDIT: thought we had a conclusion here but still looking.

madelson avatar Aug 11 '22 14:08 madelson