aspnetcore icon indicating copy to clipboard operation
aspnetcore copied to clipboard

TP Worker threads dead-lock in `Http2OutputProducer.Stop()` method

Open valdisiljuconoks opened this issue 1 year ago • 5 comments

Description

After some time of running gRPC service inside Kestrel we see huge memory increases, CPU high loads and service basically close to stalling.

Taking memdump, we see that VS detects that there is dead-lock in Stop() method of Http2OutputProducer.

image

Any hints, ideas or whatever you have on your mind would really be helpful here for us! 👍

Configuration

  • Which version of .NET is the code running on?

.NET 8.0.0 gRPC.AspNetCore 2.59.0

  • What OS version, and what distro if applicable?

Windows Datacenter 2019 1809

  • What is the architecture (x64, x86, ARM, ARM64)?

x64

Regression?

Data

Short description of the service: we server-side streaming from gRPC service. There is a background service which gets data from Service Bus in Azure, prepares data for streaming, and sends it over to each connected client. So, everytime somebody connects to the gRPC service, we register this client in this background service, open up Channel<T> for communication between background service and connected client and async foreach await on incoming messages in channel to write those out to IServerStreamWriter from gRPC.

Analysis

Seems like at the end, callstack points to Http2OutputProducer.Stop() method, which basically waits for _dataWriterLock:

public void Stop()
{
    lock (_dataWriterLock)
    {
        _waitingForWindowUpdates = false;
        if (!_completeScheduled || !_completedResponse)
        {
            _completeScheduled = true;
            EnqueueStateUpdate(State.Aborted);
            Schedule();
        }
    }
}

In total there are 89 threads for this process.

valdisiljuconoks avatar Feb 22 '24 10:02 valdisiljuconoks

Tagging subscribers to this area: @dotnet/ncl See info in area-owners.md if you want to be subscribed.

Issue Details

Description

After some time of running gRPC service inside Kestrel we see huge memory increases, CPU high loads and service basically close to stalling.

Taking memdump, we see that VS detects that there is dead-lock in Stop() method of Http2OutputProducer.

image

Any hints, ideas or whatever you have on your mind would really be helpful here for us! 👍

Configuration

  • Which version of .NET is the code running on?

.NET 8.0.0 gRPC.AspNetCore 2.59.0

  • What OS version, and what distro if applicable?

Windows Datacenter 2019 1809

  • What is the architecture (x64, x86, ARM, ARM64)?

x64

Regression?

Data

Short description of the service: we server-side streaming from gRPC service. There is a background service which gets data from Service Bus in Azure, prepares data for streaming, and sends it over to each connected client. So, everytime somebody connects to the gRPC service, we register this client in this background service, open up Channel<T> for communication between background service and connected client and async foreach await on incoming messages in channel to write those out to IServerStreamWriter from gRPC.

Analysis

Seems like at the end, callstack points to Http2OutputProducer.Stop() method, which basically waits for _dataWriterLock:

public void Stop()
{
    lock (_dataWriterLock)
    {
        _waitingForWindowUpdates = false;
        if (!_completeScheduled || !_completedResponse)
        {
            _completeScheduled = true;
            EnqueueStateUpdate(State.Aborted);
            Schedule();
        }
    }
}

In total there are 89 threads for this process.

Author: valdisiljuconoks
Assignees: -
Labels:

area-System.Net.Http, tenet-performance, untriaged

Milestone: -

ghost avatar Feb 22 '24 15:02 ghost

Http2OutputProducer seems like something from aspnetcore repo https://github.com/dotnet/aspnetcore/blob/32c9ec8d9c4b7f7f544cfb39dbf31f30a860d585/src/Servers/Kestrel/Core/src/Internal/Http2/Http2OutputProducer.cs#L15

CarnaViire avatar Feb 22 '24 16:02 CarnaViire

But what's holding the _dataWriterLock?

Tratcher avatar Feb 22 '24 17:02 Tratcher

But what's holding the _dataWriterLock?

I see the following callstack deadlocking:

One thread (#10188):

>	
[Waiting on lock owned by Thread 8248, double-click or press enter to switch to thread]
[Managed to Native Transition]
Microsoft.AspNetCore.Server.Kestrel.Core.dll!Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http2.Http2FrameWriter.Complete()
Microsoft.AspNetCore.Server.Kestrel.Core.dll!Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http2.Http2Connection.ProcessRequestsAsync<Microsoft.AspNetCore.Hosting.HostingApplication.Context>(Microsoft.AspNetCore.Hosting.Server.IHttpApplication<Microsoft.AspNetCore.Hosting.HostingApplication.Context> application = {Microsoft.AspNetCore.Hosting.HostingApplication})
...

image

and another (8248):

[Waiting on lock owned by Thread 10188, double-click or press enter to switch to thread]	
>	Microsoft.AspNetCore.Server.Kestrel.Core.dll!Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http2.Http2OutputProducer.Stop()
Microsoft.AspNetCore.Server.Kestrel.Core.dll!Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http2.Http2FrameWriter.AbortConnectionFlowControl()
Microsoft.AspNetCore.Server.Kestrel.Core.dll!Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http2.Http2FrameWriter.Complete()
Microsoft.AspNetCore.Server.Kestrel.Core.dll!Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http2.Http2FrameWriter.Abort(Microsoft.AspNetCore.Connections.ConnectionAbortedException error)
...

image

Sounds like someone wants to flush and abort connections more than once at a time?

valdisiljuconoks avatar Feb 22 '24 21:02 valdisiljuconoks

Thanks, we'll look into it

BrennanConroy avatar Feb 23 '24 00:02 BrennanConroy

Hi @valdisiljuconoks

I want to explain the silence on this issue after Brennan last responded. A deadlock like this is a potential denial of service attack, and upon investigation this turned out to be the case.

We have security fixes we do them privately and don't discuss them, to protect our customers, until the patches are out, as happened today on Patch Tuesday.

You probably didn't see the security implications, and the silence may have left you with feelings of being ignored, I apologise if this was the case. Unfortunately, because this was a public github issue we can't award any bug bounty, the bounty terms and conditions require a private report to MSRC, as detailed in our security policy.

blowdart avatar May 14 '24 17:05 blowdart

I'm not for the bounty here :) but it's good to know this is investigated and looked at. What exactly should be updated aspnet runtime? which version contains the fix - 8.0.5?

valdisiljuconoks avatar May 14 '24 19:05 valdisiljuconoks

I'm not for the bounty here :)

That's not the point. Security issues should be filed privately.

BrennanConroy avatar May 14 '24 20:05 BrennanConroy

That's not the point. Security issues should be filed privately.

when filing the issue - we never thought it would result in a security related issue. it looked like ordinary lock-race-something :)

valdisiljuconoks avatar May 15 '24 06:05 valdisiljuconoks