valgrind-macos icon indicating copy to clipboard operation
valgrind-macos copied to clipboard

Crash when running CoreFoundation applications (and wqthread-related problems)

Open Qix- opened this issue 5 years ago • 9 comments

I'm trying really hard to debug this but getting nowhere, even with lldb, vgdb and valgrind --vgdb=yes.

The new valgrind patch for MacOS mojave results in a successful install but on a few non-trivial applications Memcheck is causing a SIGILL to be raised. The application in question runs fine, otherwise.

Here is the stack:

==47185== valgrind: Unrecognised instruction at address 0x108afc25c.
==47185==    at 0x108AFC25C: ??? (in /usr/lib/system/libdispatch.dylib)
==47185==    by 0x108AFBD3A: ??? (in /usr/lib/system/libdispatch.dylib)
==47185==    by 0x108AFB2B1: ??? (in /usr/lib/system/libdispatch.dylib)
==47185==    by 0x108AF2720: ??? (in /usr/lib/system/libdispatch.dylib)
==47185==    by 0x108AF267C: ??? (in /usr/lib/system/libdispatch.dylib)
==47185==    by 0x108AE6B9A: ??? (in /usr/lib/system/libdispatch.dylib)
==47185==    by 0x108DD5F3B: ??? (in /usr/lib/system/libsystem_notify.dylib)
==47185==    by 0x108DD4B77: ??? (in /usr/lib/system/libsystem_notify.dylib)
==47185==    by 0x108DD2F6E: ??? (in /usr/lib/system/libsystem_notify.dylib)
==47185==    by 0x104EF87B9: _CFPrefsExtractQuadrupleFromPathIfPossible (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==47185==    by 0x108AE163C: ??? (in /usr/lib/system/libdispatch.dylib)
==47185==    by 0x108AE2D4A: ??? (in /usr/lib/system/libdispatch.dylib)
==47185== Your program just tried to execute an instruction that Valgrind
==47185== did not recognise.  There are two possible reasons for this.
==47185== 1. Your program has a bug and erroneously jumped to a non-code
==47185==    location.  If you are running Memcheck and you just saw a
==47185==    warning about a bad jump, it's probably your program's fault.
==47185== 2. The instruction is legitimate but Valgrind doesn't handle it,
==47185==    i.e. it's Valgrind's fault.  If you think this is the case or
==47185==    you are not sure, please let us know and we'll try to fix it.
==47185== Either way, Valgrind will now raise a SIGILL signal which will
==47185== probably kill your program.

The only other output comes before the application even starts - I don't know if it's relevant or not:

--47185-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option
--47185-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 2 times)
--47185-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 4 times)

I'm having issues creating a minimal reproduction case, as I have no idea which part of the application is causing this to happen (since the stacktrace doesn't give me any application-specific information).

Any tips on how to debug this would be appreciated :)

Qix- avatar Feb 23 '20 13:02 Qix-

@Qix-, thanks for your report!

The UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option messages are not relevant for this problem (but we should fix them sooner or later...).

Without a program to test, it will be extremely tricky for me to debug. As _CFPrefsExtractQuadrupleFromPathIfPossible appears in the stacktrace, CoreFoundation is involved which means we might be able to reproduce it on non-CLI programs (Safari or else maybe?).

As this is a really similar case to the ptr_munge bug of macOS 10.15 (where valgrind was not setting up the binary properly and hitting a ud2 instruction added by some kind of ASSERT), it might give us a bit more information. Could you include the full input if you ran valgrind with -d -v -v -v -v -v --trace-syscalls=yes --trace-flags=11111111 --trace-children=yes?

LouisBrunner avatar Feb 23 '20 13:02 LouisBrunner

I actually managed to reproduce the problem fairly easily, running valgrind with Hex Fiend (I am guessing any .app will work).

I have good and bad news: I already have a fix which I recovered from an old patch (you can try it here), however you will probably run into another problem rightaway: SIGSEGV on start_wqthread.

I also have a fix for that issue, but it is really experimental (see here) and need polishing.

LouisBrunner avatar Feb 23 '20 14:02 LouisBrunner

Yep definitely using CoreFoundation (it's a game engine, with a bunch of window calls). Glad it wasn't localized to my application ^^

As for testing the patches, should I rebase one onto the other in order to test both, or does the last link include them both? And should I build via the makefiles or is there a fancy brew command to do so?

Qix- avatar Feb 23 '20 14:02 Qix-

I just merged kevent_id because it was done and now it's obvious that it's needed. Which means you could use the feature/wqthread_fix branch directly (keeping in mind that it will probably crash).

Unfortunately, I am not aware of any brew command that does that, you will need to build via the Makefile.

LouisBrunner avatar Feb 23 '20 15:02 LouisBrunner

Output from feature/wqthread_fix. Built valgrind with ./autogen.sh && ./configure && make && sudo make install && valgrind /path/to/my/app

--77136-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 8 times)

valgrind: m_syswrap/syswrap-amd64-darwin.c:512 (void wqthread_hijack(Addr, Addr, Addr, Addr, Int, Addr)): Assertion 'tst->os_state.pthread - magic_delta == self' failed.

host stacktrace:
==77136==    at 0x2580521E9: ???

sched status:
  running_tid=0

Thread 1: status = VgTs_WaitSys syscall unix:266 (lwpid 771)
==77136==    at 0x108DF895E: ??? (in /usr/lib/system/libsystem_kernel.dylib)
==77136==    by 0x104F016A6: -[CFPrefsPlistSource handleReply:toRequestNewDataMessage:onConnection:retryCount:error:] (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x104F011DA: -[CFPrefsPlistSource handleReply:toRequestNewDataMessage:onConnection:retryCount:error:] (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x104F00CF8: -[CFPrefsSearchListSource handleReply:toRequestNewDataMessage:onConnection:retryCount:error:] (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x108EABAB9: ??? (in /usr/lib/system/libxpc.dylib)
==77136==    by 0x104F00C3A: __34-[_CFXPreferences canLookUpAgents]_block_invoke (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x104F00AFE: ___CFPrefsDirectMode_block_invoke (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x104F00527: ___CFGetCachedUnsandboxedHomeDirectoryForUser_block_invoke (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x104F004FD: ___CFGetCachedUnsandboxedHomeDirectoryForUser_block_invoke (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x108AEB671: ??? (in /usr/lib/system/libdispatch.dylib)
==77136==    by 0x108AFBA42: ??? (in /usr/lib/system/libdispatch.dylib)
==77136==    by 0x108AFB595: ??? (in /usr/lib/system/libdispatch.dylib)
==77136==    by 0x10504622A: __CFStringEncodingICUToBytes (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x104F8354C: __64-[_CFXPreferences copyKeyListForIdentifier:user:host:container:]_block_invoke (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x104EFF40E: -[__NSArrayM getObjects:range:] (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x104EFEE41: -[CFPrefsSearchListSource alreadylocked_copyValueForKey:] (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x104EFEA08: _CFStringCheckAndGetCharacters (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x104EFE950: CFStringHashISOLatin1CString (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x104EFE90E: CFStringHashISOLatin1CString (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x104EE89B5: -[_CFXPreferences(SearchListAdditions) withSearchListForIdentifier:container:cloudConfigurationURL:perform:] (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x104EE8677: -[_CFXPreferencesHandle copyPrefs] (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x104EE83C8: ___CFPrefsCopyDefaultPreferences_block_invoke (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x104EE7F88: CFArrayGetCount (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x102AB29ED: -[NSUserDefaults(NSUserDefaults) init] (in /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation)
==77136==    by 0x102ABAB4F: +[NSUserDefaults(NSUserDefaults) standardUserDefaults] (in /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation)
==77136==    by 0x103760C77: +[NSApplication initialize] (in /System/Library/Frameworks/AppKit.framework/Versions/C/AppKit)
==77136==    by 0x1066AE4B1: ??? (in /usr/lib/libobjc.A.dylib)
==77136==    by 0x1066AE864: ??? (in /usr/lib/libobjc.A.dylib)
==77136==    by 0x1066AE79A: ??? (in /usr/lib/libobjc.A.dylib)
==77136==    by 0x1066AF62E: ??? (in /usr/lib/libobjc.A.dylib)
==77136==    by 0x10669E68F: ??? (in /usr/lib/libobjc.A.dylib)
==77136==    by 0x10669E113: ??? (in /usr/lib/libobjc.A.dylib)
==77136==    by 0x10056E8E8: Cocoa_RegisterApp (SDL_cocoaevents.m:404)
==77136==    by 0x10057603B: Cocoa_CreateDevice (SDL_cocoavideo.m:58)
==77136==    by 0x100487B3A: SDL_VideoInit_REAL (SDL_video.c:505)
==77136==    by 0x10037D379: SDL_InitSubSystem_REAL (SDL.c:206)
==77136==    by 0x10037D712: SDL_Init_REAL (SDL.c:291)
==77136==    by 0x1003953E6: SDL_Init (SDL_dynapi_procs.h:85)
==77136==    by 0x100221249: tide::renderer::init_render_thread() (renderer.cc:79)
==77136==    by 0x1001BD0E3: main (main.cc:60)
client stack range: [0x105E97000 0x106696FFF] client SP: 0x106694E28
valgrind stack range: [0x700001AA0000 0x700001B9FFFF] top usage: 9800 of 1048576

Thread 2: status = VgTs_WaitSys syscall unix:368 (lwpid 5387)
==77136==    at 0x108DF8BFA: ??? (in /usr/lib/system/libsystem_kernel.dylib)
==77136==    by 0x108E4D6E5: ??? (in /usr/lib/system/libsystem_pthread.dylib)
==77136==    by 0x108E4D3FC: ??? (in /usr/lib/system/libsystem_pthread.dylib)
client stack range: ??????? client SP: 0x70000DDE6F98
valgrind stack range: [0x700006142000 0x700006241FFF] top usage: 3304 of 1048576

Thread 3: status = VgTs_WaitSys syscall mach:31 (lwpid 9987)
==77136==    at 0x108DF721A: ??? (in /usr/lib/system/libsystem_kernel.dylib)
==77136==    by 0x108DF7767: ??? (in /usr/lib/system/libsystem_kernel.dylib)
==77136==    by 0x108EA60D7: ??? (in /usr/lib/system/libxpc.dylib)
==77136==    by 0x108EA5E30: ??? (in /usr/lib/system/libxpc.dylib)
==77136==    by 0x108EB59F2: ??? (in /usr/lib/system/libxpc.dylib)
==77136==    by 0x108EAA1C6: ??? (in /usr/lib/system/libxpc.dylib)
==77136==    by 0x108EA9D93: ??? (in /usr/lib/system/libxpc.dylib)
==77136==    by 0x108EA9BAB: ??? (in /usr/lib/system/libxpc.dylib)
==77136==    by 0x108EA9B36: ??? (in /usr/lib/system/libxpc.dylib)
==77136==    by 0x104F00707: -[_CFXPreferences withConnectionForRole:performBlock:] (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x104F0051A: ___CFGetCachedUnsandboxedHomeDirectoryForUser_block_invoke (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x104F004FD: ___CFGetCachedUnsandboxedHomeDirectoryForUser_block_invoke (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==77136==    by 0x108AEB671: ??? (in /usr/lib/system/libdispatch.dylib)
==77136==    by 0x108AFAF94: ??? (in /usr/lib/system/libdispatch.dylib)
==77136==    by 0x108AEB63C: ??? (in /usr/lib/system/libdispatch.dylib)
==77136==    by 0x108AF9508: ??? (in /usr/lib/system/libdispatch.dylib)
==77136==    by 0x108AF9B45: ??? (in /usr/lib/system/libdispatch.dylib)
==77136==    by 0x108E4D6B2: ??? (in /usr/lib/system/libsystem_pthread.dylib)
==77136==    by 0x108E4D3FC: ??? (in /usr/lib/system/libsystem_pthread.dylib)
client stack range: ??????? client SP: 0x70000DE69B28
valgrind stack range: [0x700006246000 0x700006345FFF] top usage: 3872 of 1048576

If you'd like, I can do a run with the high-verbosity flags you mentioned before.

Qix- avatar Feb 23 '20 16:02 Qix-

No it's alright, I get the exact same error, so at least there is comfort in that...

I'll need to continue working on the wqthread fix as it's obviously still buggy.

LouisBrunner avatar Feb 23 '20 16:02 LouisBrunner

Alright, sounds good :) Let me know if I can help in any way! Thanks for all of the work you've done, it's incredibly appreciated.

Qix- avatar Feb 23 '20 16:02 Qix-

Thanks a lot for your kind words! I will try to look into it, but if you want to investigate yourself, it would be greatly appreciated.

LouisBrunner avatar Mar 05 '20 22:03 LouisBrunner

I'm not sure If this is the right place to ask this, is there any tool that can be used to detect memory leak beside valgrind? Because of wqthread issues it's impossible to check multithread app with valgrind in osx according to this https://bugs.kde.org/show_bug.cgi?id=380269 which might be related, link is already 4 years old and I guess it's safe to assume it would not be fixed shortly

ernesernesto avatar Jun 27 '21 10:06 ernesernesto