runtime icon indicating copy to clipboard operation
runtime copied to clipboard

Crash with "munmap_chunk(): invalid pointer" error

Open John-R-D opened this issue 1 year ago • 6 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Describe the bug

During the course of normal operations, we suddenly got an error "munmap_chunk(): invalid pointer", which results in the application crashing with the log message "Crashing thread 0057 signal 6 (0006)". Analyzing the dump of the crashing thread shows the following stack:

[InlinedCallFrame: 00007f6de6f9c968] Interop+Crypto.X509Destroy(IntPtr)
[InlinedCallFrame: 00007f6de6f9c968] Interop+Crypto.X509Destroy(IntPtr)
System.Runtime.InteropServices.SafeHandle.Finalize() [/_/src/libraries/System.Private.CoreLib/src/System/Runtime/InteropServices/SafeHandle.cs @ 90]
[DebuggerU2MCatchHandlerFrame: 00007f6de6f9cd30] 

We are currently using .NET 6.0 GC mode.

Expected Behavior

Application does not crash

Steps To Reproduce

Unable to reproduce

Exceptions (if any)

No response

.NET Version

8.0.7

Anything else?

                Host:
                Version:      8.0.7
                Architecture: x64
                Commit:       2aade6beb0
                RID:          linux-x64

                .NET SDKs installed:
                No SDKs were found.

                .NET runtimes installed:
                Microsoft.AspNetCore.App 8.0.7 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
                Microsoft.NETCore.App 8.0.7 [/usr/share/dotnet/shared/Microsoft.NETCore.App]

                Other architectures found:
                None

                Environment variables:
                Not set

                global.json file:
                Not found

John-R-D avatar Aug 19 '24 17:08 John-R-D

Unfortunately with the information given, this issue isn't actionable (no repro, no code, etc.).

Because of Interop+Crypto.X509Destroy(IntPtr) (from the stack trace) I guess this issue should be moved to runtime anyway.

gfoidl avatar Aug 20 '24 08:08 gfoidl

@John-R-D Can you please provide more info about what sort of app you're running? Is it aspnetcore? If so, is it kestrel or IIS? As @gfoidl points out, this isn't actionable with the info provided.

Since dotnet/runtime has some closed issues about munmap_chunk, they might have a better idea of where to start (and can, of course, move the issue back here if it's specific to aspnetcore).

amcasey avatar Aug 20 '24 23:08 amcasey

Tagging subscribers to this area: @dotnet/area-system-security, @bartonjs, @vcsjones See info in area-owners.md if you want to be subscribed.

We're running aspnetcore using kestrel.
As the stack from the error has so little information, I wouldn't know how to produce test code that can reproduce the issue. However, given the location of the error, it seemed prudent to post it here to at least let you know there is some kind of error.

Do I need to do anything to move this over to dotnet/runtime? Is there some way I might be able to narrow down the source of the issue to be able to produce code that can reproduce the issue?

John-R-D avatar Aug 22 '24 21:08 John-R-D

@bartonjs - Note this is occurring in .NET 8. Does anything come to mind for what could lead to such a crash?

@John-R-D - How frequently is this happening? Is it occurring repeatedly, or did this happen once with service recovery thereafter?

jeffhandley avatar Aug 23 '24 02:08 jeffhandley

Looking through our logs, I see 5 instances of this in the last 5 months. Four of the cases have identical stack traces, and one failed to even collect a dump but the error message and symptoms match. We are seeing several different error messages associated with the issue, from: .NET Finalizer[11008]: segfault ... in libc.so.6 double free or corruption (out) munmap_chunk(): invalid pointer

We did create a previous issue here that I wasn't aware of when I opened this case due to the error message being different (https://github.com/dotnet/runtime/issues/101804) where the stack of the failing thread is identical. Though I'm not sure this new case provides any additional details that may help in identifying the issue.

In the most recent 3 cases, the service did not recover. The application entered a frozen state (twice after the dump was collected and once before we got a dump) and required a manual reset.

John-R-D avatar Aug 27 '24 18:08 John-R-D

Thanks for that extra info and connecting this back to the previous issue, @John-R-D. I'm going to triage this out of the 9.0.0 milestone, but we will leave the issue open for future investigation as it has reoccurred since we closed/locked the previous issue.

jeffhandley avatar Sep 10 '24 20:09 jeffhandley

As with the sister issue (#101804), without a repro there's nothing we can really do here. We only free certificates via SafeHandle, which guarantees just one free per object. Double-frees would still be possible if the SafeHandle.DangerousGetHandle() value is being used in some other path that can call X509_free, but if we have any of those (that aren't pairing it with an X509_up_ref ahead of time) they should manifest deterministically.

If we have a lifetime/safety problem, I'd love to fix it. But there's simply not enough data here to enable us to find it.

bartonjs avatar Nov 21 '24 00:11 bartonjs