Support running native debuggers in ocamltest
This change introduces the ability to run native debuggers (LLDB and GDB) as part of the ocaml testsuite.
GBD and LLDB both provide Python APIs for interacting with the debugger. This makes it possible to write Python scripted tests, e.g. setting breakpoints, inspecting stack frames and looking up symbol information. In particular LLDB has a test suite for the Python API https://github.com/llvm/llvm-project/tree/main/lldb/test/API with many useful examples of what is possible.
Of course you can still provide CLI arguments to the debugger, with some extra work to sanitize the output to remove machine information (hex-addresses, process ids, file paths etc). For that I've re-used the existing awk dependency which should be sufficient for the Unix platforms that will be initially supported.
Having support in ocamltest for LLDB and GDB will help ensure that any future changes don't break the debugging experience and more tests can be written in Python to exercise OCaml debugging functionality.
6d101c0 is a small refactor to make the existing bytecode debugger tests use debugger_script as the name of the commands file to run against the debugger. To run a debugger test you would have:
debugger_script to point to the commands and choose which debugger to run (gdb|lldb|ocamldebug).
This change should be taken as a whole with the commits squashed before merging.
Thanks a lot for the prompt and nice review, @MisterDA!
Antonin Décimo (2024/05/27 03:40 -0700):
- Is it necessary to introduce the
dev_nullvariable? I don't ever recall hearing about an Unix where it's not/dev/null(POSIX defines/dev/null). ThePath to /dev/nulldescription is redundant, maybePath to the null deviceis a better wording? and would also be portable to Windows where the null device is simply calledNUL. Note that we also haveFilename.nullin the standard library.
It's because ocamltest's API does not make it easy, at the moment, to redirect to files directly. To redirect to something, you have to provide the name of a variable thtat ocntains the name of the file you want to redirect to.
Also as you mention, the predefined content of the variable could be OS-specific, that wouldn't be easy to achieve.
Moving to Draft status, I have discovered issues with CI permissions and repeatability of generated symbols, that need fixing before this is mergable.
Okay thanks a lot for letting us know!
Will you please post here again to make sure we know the code is ready to be reviewed?
This change introduces the ability to run GDB or LLDB (what I'm calling native debuggers) to ocamltest. The main changes are in the first four commits.
8f9c18898d768040cc267c6684d582d9507b22b5 provides a hopefully compelling example of what can be done with native debugger support. The fixes in #13271 #13261 #13241 are all checked with this test and provide future scope for validating more debugging features work. @gasche @kayceesrk
At the moment the test is implemented with tools we already use in the build scripts (sed and shell) with all the drawbacks of cross platform shell scripting. In future these tests could rely on LLDB/GDB python support to implement tests.
Many thanks for the work!
At this stage I have three observations:
-
When running the test, could you please print which debugger is run?
-
I notice that in the code you emphase that the debuggers are for native code and I was wondering why that matters, actually?
-
YOu have added allthose debuggers to files whosename is prefixed with "ocaml_" but this is not really where they should be added, as those files are for tests and actions that are specific to OCaml, which is probably notthe case for these debuggers so I'd suggset to add all this to the "builtin_" files and to create them when they don't exist. Also, I think in the names of the commands and files the "run prefix does not relally looknecessary to me?
@shindere answers inline below.
- When running the test, could you please print which debugger is run?
It already does this, it will print out "Debugging program %s with %s" based on the program and debugger used.
- I notice that in the code you emphase that the debuggers are for native code and I was wondering why that matters, actually?
I used the distinction of native vs bytecode based on the compiler backends. ocamldebug was already covered by tests in tools-debugger along with the other tools distributed with the compiler. Since lldb/gdb aren't distributed with the compiler and are for debugging native executables I've put them into a separate directory and set of tests. Other arrangements of directory / test naming are possible, keeping the information about OS, debugger and architecture obvious is useful when running the tests.
- YOu have added allthose debuggers to files whosename is prefixed with "ocaml_" but this is not really where they should be added, as those files are for tests and actions that are specific to OCaml, which is probably notthe case for these debuggers so I'd suggset to add all this to the "builtin_" files and to create them when they don't exist. Also, I think in the names of the commands and files the "run prefix does not relally looknecessary to me?
41ab0f0 moves all the debugger related code into separate modules. Using builtin_ didn't work because ocamldebug had a dependency on certain ocaml_action paths. This organisation seems better and allowed me to simplify the code for debugger_actions.ml.
Many thanks for the refactoring work, @tmcgilchrist.
Studying the code, I think there is room for further simplification and re-architecturing the code, but I propose to leave that for another PR since it's not so easy to describe and also because a lot of time and energy has already been spent on this PR which I think can safely be merged in its current state, once the Changes entry has been updated.
The new tests have been failing on amd64+ubuntu and arm64 on the Inria CI due to mismatched line numbers and mangling.
It's not clear from your report how to fix the output sanitizer, (putting myself in the shoes of someone willing to fix this, theoretically) could you include examples of the output on these CI machines?
I have a pending fix to tweak the .c line numbers as this is also hitting one of my PRs; however I can't fix the lldb oracles because the lldb tests also depend on a particular python module (embedded_interpreter) which I can't seem to find ATM.
See #13477.
There's a few more fixes required to the python / sed to make this more general on top of https://github.com/ocaml/ocaml/pull/13477. While I work on that please consider disabling the tests if they are causing problems.
@Octachron if you could include linux distro, gdb and lldb version numbers along with the failing tests that would help. I've mainly tested my work on recent Ubnutu 22.04/24.04 distros
@tmcgilchrist , this information is available on the inria CI at https://ci.inria.fr/ocaml (on which you have access rights as far as I can see).
Thank you @Octachron I do have access and have found the right logs.
Miod Vallat (2024/09/25 07:19 -0700):
however I can't fix the lldb oracles because the lldb tests also depend on a particular python module (embedded_interpreter) which I can't seem to find ATM.
In case you didn't find it meanwhile, on debian at least this seems to be rpovided by each python3-lldb-{version} package, and there is also a python3-lldb package that I expect pulls the right lldb-version-specific package.
At some point we will need to make sure the GitHub runners do have gdb and lldb and all the Python support code installed so that the test are actually run rather than skipped.
Is that something you could easily help with, @MisterDA?
@shindere I'll include installing gdb and lldb for Linux in my PR that I'm working on.