RFCs icon indicating copy to clipboard operation
RFCs copied to clipboard

Demangle Symbols in Debuggers (LLDB, GDB)

Open miguelmartin75 opened this issue 2 years ago • 2 comments

Summary

Related issue which is closed: https://github.com/nim-lang/Nim/issues/8596

  • Nim can be debugged with LLDB (or GDB)
  • Name mangling causes UX issues with debugging in LLDB and GDB by requiring you to refer to Nim symbols in their mangled form.
    • The suggested workaround is quite hacky. For variables, this requires you to print all local or global variables. Then you scan and find the variable in a GUI or Terminal output. For other symbols, such as breaking at a function call, I can see this being quite frustrating.
    • Preferably we would not have to refer to names as mangled, and e.g. could print x rather than print x_<nim-specific-mangle>
  • This is a common problem, as noted from the forums:
    • https://forum.nim-lang.org/t/9735
    • https://forum.nim-lang.org/t/9906
    • https://forum.nim-lang.org/t/9735

Description

Here are my findings from researching LLDB. I have not researched GDB. I thought I would post them here in case others wanted to implement/execute this or whether I have missed something in my proposed solution.

For LLDB, one needs to:

  1. (Required) Let LLDB know how to identify the mangling scheme & how to de-mangle a symbol
  2. (Optional) Implement a Language plugin for deeper LLDB integration

References:

From reading the source: a unique mangling scheme identifiable from others is needed along with code to de-mangle it. All mangling schemes used by other languages/compilers (C++/Itanium, C++/MSVC, D, Rust) use a prefix to classify how/from what compiler the name was mangled.

For Nim: identifying the mangling scheme/language from a mangled name is more complex. This is because Nim is compiled into a target language that uses an existing mangling scheme. If we had control over the binary or Debug Symbol output file (e.g. DWARF), I believe this would be easier, but again: since the target language's compiler is being used it is slightly more complex.

To solve this with today's standard Nim compiler, here are my researched steps:

  1. Contain/embed a unique constant identifier within each symbol to identify that this symbol was output from the Nim compiler. Modifications to be done here: https://github.com/nim-lang/Nim/blob/502a4486aeb8d0a5dcdf86540522d3dc16960536/compiler/ccgutils.nim#L71
    • This unfortunately would have a chance to overlap with identifiers that are used for C or C++ code in existing codebases. Unicode symbols would allow for rare conflicts but would require C99 or above
    • This probably requires an RFC and further discussion
  2. Modify LLDB:
    1. Modify the Mangle class
      1. Add mangling scheme enum entry for Nim here: https://github.com/llvm/llvm-project/blob/main/lldb/include/lldb/Core/Mangled.h#L41-L48
      2. Classify if the symbol originates from the Nim compiler with the above knowledge: https://github.com/llvm/llvm-project/blob/main/lldb/source/Core/Mangled.cpp#L42-L79
        • Implementation seems to require one-level deep recursion
      3. Call & implement demangling code in C++
    • Getting this accepted to LLDB might be difficult (due to valid C/C++ identifiers). Perhaps a compiler option similar to Apple's LLDB (see here) or a run-time flag would be appropriate here (seems to require many modifications of LLDB, maybe LLVM folks know best here)
  3. (optional): implement a Language plugin. Why? Deeper integration with LLDB
    • See here for swift's language plugin, here's is the Language class: https://github.com/apple/llvm-project/blob/40e3ca95e3f05c7b5286092d52a33a751a717a5e/lldb/source/Plugins/Language/Swift/SwiftLanguage.h#L26
    • Docs seem to be lacking, but it seems to be for:
      • Help/docs on symbols
      • De-mangle functions without parameters mangled in the name (GetDemangledFunctionNameWithoutArguments)
      • Probably other things, for Swift it seems to be related to the REPL integration with LLDB

Alternatives

Here are some alternatives I can think of, but will likely require more work:

  1. Modify the nim compiler to output the target assembly directly (or via LLVM), this is related to NIR
    • It would be likely be easier convincing the LLVM/LLDB team to merge the name de-mangling changes for Nim if it did not conflict with C/C++ symbols
  2. Write a debugger in Nim. Pros:
    • Would offer a chance to integrate with the compiler, i.e. to evaluate nimscript in the debugger or to modify the program at run-time / to provide a REPL similar to Swift
    • Reading & modifying the LLDB code is hard with all the OOP/abstraction

Examples

No response

Backwards Compatibility

My proposed solution will change the way the nim compiler mangles, but for backward compatibility: one could offer a flag to mangle the old way. Though I don't think this flag would be necessary: just re-compile your source if you want debugging support.

Links

Mangling & D:

  • https://dlang.org/blog/2017/12/20/ds-newfangled-name-mangling/

LLDB codepointers:

  • Language Plugin: https://github.com/apple/llvm-project/tree/40e3ca95e3f05c7b5286092d52a33a751a717a5e/lldb/source/Plugins/Language/Swift
  • Mangled: https://github.com/apple/llvm-project/blob/40e3ca95e3f05c7b5286092d52a33a751a717a5e/lldb/source/Core/Mangled.cpp#L319-L341
  • Guess lang: https://github.com/apple/llvm-project/blob/next/lldb/source/Core/Mangled.cpp#L471-L476

Writing a debugger:

  • https://www.timdbg.com/posts/writing-a-debugger-from-scratch-part-1/
  • https://opensource.com/article/18/1/how-debuggers-work

miguelmartin75 avatar Nov 28 '23 00:11 miguelmartin75

+1 Please let's write our own debugger.

Zectbumo avatar Nov 28 '23 03:11 Zectbumo

Implementing for GDB would be similar process ^1. Imo adding support to existing debuggers is better than writing our own since it means less maintenance and allows easy integration with existing tools

ire4ever1190 avatar Dec 10 '23 10:12 ire4ever1190