Crash with "munmap_chunk(): invalid pointer" error
Is there an existing issue for this?
- [X] I have searched the existing issues
Describe the bug
During the course of normal operations, we suddenly got an error "munmap_chunk(): invalid pointer", which results in the application crashing with the log message "Crashing thread 0057 signal 6 (0006)". Analyzing the dump of the crashing thread shows the following stack:
[InlinedCallFrame: 00007f6de6f9c968] Interop+Crypto.X509Destroy(IntPtr)
[InlinedCallFrame: 00007f6de6f9c968] Interop+Crypto.X509Destroy(IntPtr)
System.Runtime.InteropServices.SafeHandle.Finalize() [/_/src/libraries/System.Private.CoreLib/src/System/Runtime/InteropServices/SafeHandle.cs @ 90]
[DebuggerU2MCatchHandlerFrame: 00007f6de6f9cd30]
We are currently using .NET 6.0 GC mode.
Expected Behavior
Application does not crash
Steps To Reproduce
Unable to reproduce
Exceptions (if any)
No response
.NET Version
8.0.7
Anything else?
Host:
Version: 8.0.7
Architecture: x64
Commit: 2aade6beb0
RID: linux-x64
.NET SDKs installed:
No SDKs were found.
.NET runtimes installed:
Microsoft.AspNetCore.App 8.0.7 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.NETCore.App 8.0.7 [/usr/share/dotnet/shared/Microsoft.NETCore.App]
Other architectures found:
None
Environment variables:
Not set
global.json file:
Not found
Unfortunately with the information given, this issue isn't actionable (no repro, no code, etc.).
Because of Interop+Crypto.X509Destroy(IntPtr) (from the stack trace) I guess this issue should be moved to runtime anyway.
@John-R-D Can you please provide more info about what sort of app you're running? Is it aspnetcore? If so, is it kestrel or IIS? As @gfoidl points out, this isn't actionable with the info provided.
Since dotnet/runtime has some closed issues about munmap_chunk, they might have a better idea of where to start (and can, of course, move the issue back here if it's specific to aspnetcore).
Tagging subscribers to this area: @dotnet/area-system-security, @bartonjs, @vcsjones See info in area-owners.md if you want to be subscribed.
We're running aspnetcore using kestrel.
As the stack from the error has so little information, I wouldn't know how to produce test code that can reproduce the issue. However, given the location of the error, it seemed prudent to post it here to at least let you know there is some kind of error.
Do I need to do anything to move this over to dotnet/runtime? Is there some way I might be able to narrow down the source of the issue to be able to produce code that can reproduce the issue?
@bartonjs - Note this is occurring in .NET 8. Does anything come to mind for what could lead to such a crash?
@John-R-D - How frequently is this happening? Is it occurring repeatedly, or did this happen once with service recovery thereafter?
Looking through our logs, I see 5 instances of this in the last 5 months. Four of the cases have identical stack traces, and one failed to even collect a dump but the error message and symptoms match. We are seeing several different error messages associated with the issue, from: .NET Finalizer[11008]: segfault ... in libc.so.6 double free or corruption (out) munmap_chunk(): invalid pointer
We did create a previous issue here that I wasn't aware of when I opened this case due to the error message being different (https://github.com/dotnet/runtime/issues/101804) where the stack of the failing thread is identical. Though I'm not sure this new case provides any additional details that may help in identifying the issue.
In the most recent 3 cases, the service did not recover. The application entered a frozen state (twice after the dump was collected and once before we got a dump) and required a manual reset.
Thanks for that extra info and connecting this back to the previous issue, @John-R-D. I'm going to triage this out of the 9.0.0 milestone, but we will leave the issue open for future investigation as it has reoccurred since we closed/locked the previous issue.
As with the sister issue (#101804), without a repro there's nothing we can really do here. We only free certificates via SafeHandle, which guarantees just one free per object. Double-frees would still be possible if the SafeHandle.DangerousGetHandle() value is being used in some other path that can call X509_free, but if we have any of those (that aren't pairing it with an X509_up_ref ahead of time) they should manifest deterministically.
If we have a lifetime/safety problem, I'd love to fix it. But there's simply not enough data here to enable us to find it.