aspnetcore
aspnetcore copied to clipboard
TP Worker threads dead-lock in `Http2OutputProducer.Stop()` method
Description
After some time of running gRPC service inside Kestrel we see huge memory increases, CPU high loads and service basically close to stalling.
Taking memdump, we see that VS detects that there is dead-lock in Stop()
method of Http2OutputProducer
.
Any hints, ideas or whatever you have on your mind would really be helpful here for us! 👍
Configuration
- Which version of .NET is the code running on?
.NET 8.0.0 gRPC.AspNetCore 2.59.0
- What OS version, and what distro if applicable?
Windows Datacenter 2019 1809
- What is the architecture (x64, x86, ARM, ARM64)?
x64
Regression?
Data
Short description of the service: we server-side streaming from gRPC service. There is a background service which gets data from Service Bus in Azure, prepares data for streaming, and sends it over to each connected client. So, everytime somebody connects to the gRPC service, we register this client in this background service, open up Channel<T>
for communication between background service and connected client and async foreach await
on incoming messages in channel to write those out to IServerStreamWriter
from gRPC.
Analysis
Seems like at the end, callstack points to Http2OutputProducer.Stop()
method, which basically waits for _dataWriterLock
:
public void Stop()
{
lock (_dataWriterLock)
{
_waitingForWindowUpdates = false;
if (!_completeScheduled || !_completedResponse)
{
_completeScheduled = true;
EnqueueStateUpdate(State.Aborted);
Schedule();
}
}
}
In total there are 89 threads for this process.
Tagging subscribers to this area: @dotnet/ncl See info in area-owners.md if you want to be subscribed.
Issue Details
Description
After some time of running gRPC service inside Kestrel we see huge memory increases, CPU high loads and service basically close to stalling.
Taking memdump, we see that VS detects that there is dead-lock in Stop()
method of Http2OutputProducer
.
Any hints, ideas or whatever you have on your mind would really be helpful here for us! 👍
Configuration
- Which version of .NET is the code running on?
.NET 8.0.0 gRPC.AspNetCore 2.59.0
- What OS version, and what distro if applicable?
Windows Datacenter 2019 1809
- What is the architecture (x64, x86, ARM, ARM64)?
x64
Regression?
Data
Short description of the service: we server-side streaming from gRPC service. There is a background service which gets data from Service Bus in Azure, prepares data for streaming, and sends it over to each connected client. So, everytime somebody connects to the gRPC service, we register this client in this background service, open up Channel<T>
for communication between background service and connected client and async foreach await
on incoming messages in channel to write those out to IServerStreamWriter
from gRPC.
Analysis
Seems like at the end, callstack points to Http2OutputProducer.Stop()
method, which basically waits for _dataWriterLock
:
public void Stop()
{
lock (_dataWriterLock)
{
_waitingForWindowUpdates = false;
if (!_completeScheduled || !_completedResponse)
{
_completeScheduled = true;
EnqueueStateUpdate(State.Aborted);
Schedule();
}
}
}
In total there are 89 threads for this process.
Author: | valdisiljuconoks |
---|---|
Assignees: | - |
Labels: |
|
Milestone: | - |
Http2OutputProducer seems like something from aspnetcore repo https://github.com/dotnet/aspnetcore/blob/32c9ec8d9c4b7f7f544cfb39dbf31f30a860d585/src/Servers/Kestrel/Core/src/Internal/Http2/Http2OutputProducer.cs#L15
But what's holding the _dataWriterLock?
But what's holding the _dataWriterLock?
I see the following callstack deadlocking:
One thread (#10188
):
>
[Waiting on lock owned by Thread 8248, double-click or press enter to switch to thread]
[Managed to Native Transition]
Microsoft.AspNetCore.Server.Kestrel.Core.dll!Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http2.Http2FrameWriter.Complete()
Microsoft.AspNetCore.Server.Kestrel.Core.dll!Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http2.Http2Connection.ProcessRequestsAsync<Microsoft.AspNetCore.Hosting.HostingApplication.Context>(Microsoft.AspNetCore.Hosting.Server.IHttpApplication<Microsoft.AspNetCore.Hosting.HostingApplication.Context> application = {Microsoft.AspNetCore.Hosting.HostingApplication})
...
and another (8248
):
[Waiting on lock owned by Thread 10188, double-click or press enter to switch to thread]
> Microsoft.AspNetCore.Server.Kestrel.Core.dll!Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http2.Http2OutputProducer.Stop()
Microsoft.AspNetCore.Server.Kestrel.Core.dll!Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http2.Http2FrameWriter.AbortConnectionFlowControl()
Microsoft.AspNetCore.Server.Kestrel.Core.dll!Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http2.Http2FrameWriter.Complete()
Microsoft.AspNetCore.Server.Kestrel.Core.dll!Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http2.Http2FrameWriter.Abort(Microsoft.AspNetCore.Connections.ConnectionAbortedException error)
...
Sounds like someone wants to flush and abort connections more than once at a time?
Thanks, we'll look into it
Hi @valdisiljuconoks
I want to explain the silence on this issue after Brennan last responded. A deadlock like this is a potential denial of service attack, and upon investigation this turned out to be the case.
We have security fixes we do them privately and don't discuss them, to protect our customers, until the patches are out, as happened today on Patch Tuesday.
You probably didn't see the security implications, and the silence may have left you with feelings of being ignored, I apologise if this was the case. Unfortunately, because this was a public github issue we can't award any bug bounty, the bounty terms and conditions require a private report to MSRC, as detailed in our security policy.
I'm not for the bounty here :) but it's good to know this is investigated and looked at. What exactly should be updated aspnet runtime? which version contains the fix - 8.0.5?
I'm not for the bounty here :)
That's not the point. Security issues should be filed privately.
That's not the point. Security issues should be filed privately.
when filing the issue - we never thought it would result in a security related issue. it looked like ordinary lock-race-something :)