dotnet stack hangup on trying to get the stackframes of a stuck process
If I had enough information to file this as a bug report I'd file this as a bug report. It feels very much like a bug; but it might be a bug in the runtime, or something else. Anyway; this behavior is very bad and very unexpected.
Background:
We have this network listener process that's been getting stuck every week or so; the process is on our server and is receiving (encrypted) data from the process on the customer server. Our own internal status check on the stuck process also gets stuck; and the symptoms of the stuck-ness make no sense from an application codebase perspective. (Thankfully this process doesn't use async code so the stacktraces ought to make sense.)
So I said OK, lets get a stack trace next time. We looked up how to do this, found dotnet-stack, copied the standalone binary (this URL https://aka.ms/dotnet-stack/win-x64, a week and a half ago) to the server (it's a server core server), and waited for the next time for our process to get stuck.
So it got stuck, as expected. I than ran dotnet-stack report --process-id 4860 and it got stuck. In fact it got stuck so badly that ^C didn't get the command prompt back. I tried a second time; running dotnet-stack report --process-id 4860 > stack.txt and just leaving it running with the remote desktop window shoved in the background. After waiting for at least 14 minutes; found it it was still stuck; only this time ^C was able to get the command prompt back. As expected, the output file was empty.
The target process is an x64 .NET 8 process; working memory was 63MB.
We have a full memory dump of the process; the managed runtime is deadlocked.
Summary:
It's possible for dotnet-stack to get stuck trying to dump stack from a stuck process. This seems like it should not occur.
Environment:
Windows Server Core: probably server core 2022 but might be 2019 Hosting Environment: Azure (Central) dotnet-stack: win64 standalone binary target process: .NET 8 winx64 process; shipped as framework included (dotnet publish -r win-x64)
Reproducibility:
At this rate I get one attempt a week.
Stuck-ness does not appear to be data-related. On restarting the process it recovers where it left off, successfully processing the very message it hung up in the middle of.