diagnostics icon indicating copy to clipboard operation
diagnostics copied to clipboard

libsosplugin cannot be loaded into lldb (macos arm64)

Open yuraaka opened this issue 2 years ago • 22 comments

Description

I try to install dotnet-sos to get an ability to debug dotnet programs on mac, but after installing it according to this manual, lldb shows error on startup:

$ lldb
error: this file does not represent a loadable dylib
error: 'setsymbolserver' is not a valid command.
(lldb) 

Futher context:

$ cat .lldbinit 
#START - ADDED BY SOS INSTALLER
plugin load /Users/yuraaka/.dotnet/sos/libsosplugin.dylib
setsymbolserver -ms
#END - ADDED BY SOS INSTALLER

Also, I tried to build libsosplugin from source, but got the same result. What do I do wrong?

Reproduction Steps

  1. Install dotnet-sos
  2. Run lldb
  3. Get an error

Expected behavior

No error is appeared

Actual behavior

Error "this file does not represent a loadable dylib".

Regression?

No response

Known Workarounds

No response

Configuration

  • Apple M2 Max
  • macOS Ventura 13.5.2
$ lldb -v
lldb-1403.0.17.67
Apple Swift version 5.8.1 (swiftlang-5.8.0.124.5 clang-1403.0.22.11.100)

$ dotnet-sos --version
7.0.442301+6245a3eeff5a12218eb5b615788d776027133e91

Other information

No response

yuraaka avatar Sep 21 '23 16:09 yuraaka

Tagging subscribers to this area: @tommcdon See info in area-owners.md if you want to be subscribed.

Issue Details

Description

I try to install dotnet-sos to get an ability to debug dotnet programs on mac, but after installing it according to this manual, lldb shows error on startup:

$ lldb
error: this file does not represent a loadable dylib
error: 'setsymbolserver' is not a valid command.
(lldb) 

Futher context:

$ cat .lldbinit 
#START - ADDED BY SOS INSTALLER
plugin load /Users/yuraaka/.dotnet/sos/libsosplugin.dylib
setsymbolserver -ms
#END - ADDED BY SOS INSTALLER

Also, I tried to build libsosplugin from source, but got the same result. What do I do wrong?

Reproduction Steps

  1. Install dotnet-sos
  2. Run lldb
  3. Get an error

Expected behavior

No error is appeared

Actual behavior

Error "this file does not represent a loadable dylib".

Regression?

No response

Known Workarounds

No response

Configuration

  • Apple M2 Max
  • macOS Ventura 13.5.2
$ lldb -v
lldb-1403.0.17.67
Apple Swift version 5.8.1 (swiftlang-5.8.0.124.5 clang-1403.0.22.11.100)

$ dotnet-sos --version
7.0.442301+6245a3eeff5a12218eb5b615788d776027133e91

Other information

No response

Author: YuraAka
Assignees: -
Labels:

area-Diagnostics-coreclr

Milestone: -

ghost avatar Sep 21 '23 16:09 ghost

This is known regression in Xcode. I remember talking with @EgorBo about it so maybe he remembers the details?

filipnavara avatar Sep 21 '23 16:09 filipnavara

I tracked down the Discord thread where we discussed it. Apparently it was broken in Xcode 14.3.1 and worked in Xcode 15 (which has different problems, so beware). A more detailed description was also cross posted to Apple forum.

filipnavara avatar Sep 21 '23 16:09 filipnavara

cc @hoyosjs

EgorBo avatar Sep 21 '23 19:09 EgorBo

Apologies - I should have updated the forum thread after our discussion with Apple. They very promptly solved the issue in their codebase. It's just now becoming available with Xcode 15 (although things seem to have some issues there right now that need some investigation cc @mikem8361 )

hoyosjs avatar Sep 22 '23 00:09 hoyosjs

Apparently it was broken in Xcode 14.3.1 and worked in Xcode 15 (which has different problems, so beware). A more detailed description was also cross posted to Apple forum.

@filipnavara are any of the different problems issues we should investigate, and if so would you mind opening new issues to track it? It also sounds like we can close this particular bug because SOS loading is now fixed in XCode 15, so I will close this issue.

tommcdon avatar Sep 25 '23 15:09 tommcdon

are any of the different problems issues we should investigate, and if so would you mind opening new issues to track it?

I am not sure. I will file separate issue if necessary. With some Xcode 15 beta I was getting the following error on lldb startup: Error: Fail to initialize coreclr 80070008.

filipnavara avatar Sep 25 '23 15:09 filipnavara

With some Xcode 15 beta I was getting the following error on lldb startup: Error: Fail to initialize coreclr 80070008.

If you are still seeing this error, we will re-activate this issue to track that particular problem

tommcdon avatar Sep 25 '23 16:09 tommcdon

Yes, I've repro'ed the Error: Fail to initialize coreclr 80070008 error on our M1. Initializing the managed hosting layer is failing for some reason which means managed commands like dumpheap and eeheap won't work. We will continue to investigate but Apple has made debugging this scenario difficult (can not attach to lldb to debug SOS).

mikem8361 avatar Sep 25 '23 16:09 mikem8361

I workoarounded the initial issue on my Apple M2 machine by compiling LLDB from sources

EgorBo avatar Sep 25 '23 16:09 EgorBo

I can still reproduce the issue with XCode 15.0.1 (15A507):

➜  ~ dotnet-sos --version
8.0.452401+966acd12b91675a4d06a7572ff47c587f827beaf
➜  ~ lldb --version 
lldb-1500.0.22.8
Apple Swift version 5.9 (swiftlang-5.9.0.128.108 clang-1500.0.40.1)
➜  ~ lldb          
error: this file does not represent a loadable dylib
error: 'setsymbolserver' is not a valid command.
(lldb) 

I workoarounded the initial issue on my Apple M2 machine by compiling LLDB from sources

Building lldb from sources does not fixes the issue for me.

ylatuya avatar Nov 27 '23 08:11 ylatuya

I think the problem is that libsosplugin.dylib is an x86_64-only library:

➜  llvm-build file /Users/andoni/.dotnet/sos/libsosplugin.dylib                                                                                                              
/Users/andoni/.dotnet/sos/libsosplugin.dylib: Mach-O 64-bit dynamically linked shared library x86_64  

A workaround is forcing lldb to run as x86_64:

➜  llvm-build arch -arch x86_64 lldb
Current symbol store settings:
-> Cache: /Users/andoni/.dotnet/symbolcache
-> Server: https://msdl.microsoft.com/download/symbols/ Timeout: 4 RetryCount: 0
(lldb)

The fix for this issue is to provide libsosplugin.dylib and libsos.dylib as a fat library with x86_64 and arm64 support

ylatuya avatar Nov 27 '23 12:11 ylatuya

dotnet sos has a command line parameter to install an extension for specific architecture.

filipnavara avatar Nov 27 '23 12:11 filipnavara

dotnet-sos install --arch Arm64 is the command. I'm pretty sure this will fix your issue.

mikem8361 avatar Nov 27 '23 16:11 mikem8361

Thanks, I didn't know dotnet-sos was already providing builds for different architectures. It's strange that the x64 version was installed by default instead of the arm64, I was probably using the x64 dotnet version rather than the arm64 one.

Using the arm64 version fixes the issue and I can now reproduce the Error: Fail to initialize coreclr 80070008 issue.

ylatuya avatar Nov 27 '23 17:11 ylatuya

With 15.3, it goes straight to sigkill.

$ xcodebuild -version   
Xcode 15.3
Build version 15E204a

$ lldb helloworld/dist/helloworld 
zsh: killed     lldb helloworld/dist/helloworld

~/Library/Logs/DiagnosticReports/lldb-2024-09-17-073242.ips shows:

"exception" : {"port":0,"signal":"SIGKILL","guardId":0,"codes":"0x0000000000000000, 0x0000000000000000","violations":["SET_EXCEPTION_BEHAVIOR"],"message":" SET_EXCEPTION_BEHAVIOR on mach port 0 (guarded with 0x0000000000000000)","subtype":"GUARD_TYPE_MACH_PORT","type":"EXC_GUARD","rawCodes":[0,0]},

with PAL_MachExceptionMode=7, it just fails to load the plugin:

$ PAL_MachExceptionMode=7 lldb helloworld/dist/helloworld
SOS_HOSTING: Fail to initialize hosting runtime '/usr/local/share/dotnet/shared/Microsoft.NETCore.App/6.0.29/libcoreclr.dylib' 80004005
Unrecognized command 'setsymbolserver' because managed hosting failed or was disabled. See sethostruntime command for details.
(lldb) target create "helloworld/dist/helloworld"
Current executable set to '/Users/adeel/projects/helloworld/dist/helloworld' (arm64).

both are arm64 binaries, so this is about something else?

$ file ~/.dotnet/sos/libsosplugin.dylib /usr/local/share/dotnet/shared/Microsoft.NETCore.App/6.0.29/libcoreclr.dylib
/Users/adeel/.dotnet/sos/libsosplugin.dylib:                                  Mach-O 64-bit dynamically linked shared library arm64
/usr/local/share/dotnet/shared/Microsoft.NETCore.App/6.0.29/libcoreclr.dylib: Mach-O 64-bit dynamically linked shared library arm64

am11 avatar Sep 17 '24 05:09 am11

I believe folks have tried the following workaround with some success - https://github.com/dotnet/diagnostics/issues/4551#issuecomment-2142927236.

tommcdon avatar Sep 21 '24 03:09 tommcdon

Same issue on Sequoia 15.0 and Xcode / llvm / lldb 16.0. (released on Monday / 16th)

Better workaround with Apple's lldb (standard installation):

$ sudo cp /Applications/Xcode.app/Contents/Developer/usr/bin/lldb /usr/local/bin
$ sudo install_name_tool -add_rpath /Applications/Xcode.app/Contents/SharedFrameworks /usr/local/bin/lldb
$ sudo codesign --force --sign - /usr/local/bin/lldb

(I chose /usr/local/bin/lldb since it is in PATH before /usr/bin)

Now open a new terminal and start using lldb with libsosplugin (clrstack -f etc. are working). There is no need to specify entitlements or setting PAL_MachExceptionMode. It's just that the Apple's lldb doesn't have any entitlement set, so plugin dylib with different signature fails to load. With adhoc, apparently it's not required to specify the entitlements.

cc @lambdageek @janvorli

am11 avatar Sep 21 '24 16:09 am11

@am11, that's awesome, thank you so much for sharing this workaround!

janvorli avatar Sep 23 '24 08:09 janvorli

Does the C# commands like dumpheap -stat work? We have seen problems initializing the .NET hosting on arm64 MacOS.

mikem8361 avatar Sep 23 '24 16:09 mikem8361

Does the C# commands like dumpheap -stat work?

Apparently working (when the program is stopped at the exception):

(lldb) dumpheap -stat
Statistics:
          MT Count TotalSize Class Name
000102ed9f50     1        24 System.Reflection.Metadata.TypeNameParseOptions
000102958248     1        24 System.Collections.Generic.StringEqualityComparer
00010295be88     1        24 System.OrdinalCaseSensitiveComparer
00010295b488     1        24 System.Collections.Generic.NonRandomizedStringEqualityComparer+OrdinalIgnoreCaseComparer
...
Total 1,307 objects, 133,745 bytes

@mikem8361 btw https://github.com/dotnet/diagnostics/issues/52 is still relavant for lldb/Unix, e.g. if a class name has a unicode char dumpheap -stat renders ? (NörttiNirvana became N?rttiNirvana), while Console.WriteLine output in lldb REPL prints it correctly. So it's probably related to direct vs. indirect stdout (via lldb APIs). I had to switch to en-US to get , number grouping separator because the Finnish one has non-breaking space (char code 160) as a grouping separator, which was looking like:

0001040a92e0     5     8?392 System.Object[]
000105193d58    39    12?424 System.Int32[]
000105196e50   805    71?272 System.String
Total 1?320 objects, 140?633 bytes

(same goes for any non-ASCII char)

am11 avatar Sep 23 '24 18:09 am11