Debug: implement breakpoints and single-stepping.
This is a PR that puts together a bunch of earlier pieces (patchable calls in #12061 and #12101, private copies of code in #12051, and all the prior debug event and instrumentation infrastructure) to implement breakpoints in the guest debugger.
These are implemented in the way we have planned in #11964: each sequence point (location prior to a Wasm opcode) is now a patchable call instruction, patched out (replaced with NOPs) by default. When patched in, the breakpoint callsite calls a trampoline with the patchable ABI which then invokes the breakpoint hostcall. That hostcall emits the debug event and nothing else.
A few of the interesting bits in this PR include:
- Implementations of "unpublish" (switch permissions back to read/write from read/execute) for mmap'd code memory on all our platforms.
- Infrastructure in the frame-tables (debug info) metadata producer and parser to record "breakpoint patches".
- A tweak to the NOP metadata packaged with the
MachBufferto allow multiple NOP sizes. This lets us use one 5-byte NOP on x86-64, for example (did you know x86-64 had these?!) rather than five 1-byte NOPs.
This PR also implements single-stepping with a global-per-Store flag, because at this point why not; it's a small additional bit of logic to do all patches in all modules registered in the Store when that flag is enabled.
A few realizations for future work:
- The need for an introspection API available to a debugger to see the modules within a component is starting to become clear; either that, or the "module and PC" location identifier for a breakpoint switches to a "module or component" sum type. Right now, the tests for this feature use only core modules. Extending to components should not actually be hard at all, we just need to build the API for it.
- The interaction between inlining and
patchable_callis interesting: what happens if we inline apatchable_callat atry_callcallsite? Right now, we do not update thepatchable_callto atry_call, because there is nopatchable_try_call; this is fine in the Wasmtime embedding in practice because we never (today!) throw exceptions from a breakpoint handler. This does suggest to me that maybe we should make patchability a property of any callsite, and allow try-calls to be patchable too (with the same restriction about no return values as the only restriction); but happy to discuss that one further.
The s390x failure looks like an oversight on my part in the patchable-ABI implementation on that ISA -- the clobber-save code implicitly assumes that clobber set fits in that ABI's special clobber-save region in each frame but that's no longer true when everything is clobbered. I'll rework it in a separate PR then rebase this.
Fixed s390x in #12148; that commit is also on top here to see the fix in CI but I'll rebase out once that merges first.
OK, I'm going to go ahead and merge based on Nick's approval here -- thanks Alex and Nick for all the comments!
I had to add some icache coherence handling for aarch64 to make macOS/aarch64 happy in CI (curiously did not reproduce locally on my M1 laptop; but there could be several reasons why cache incoherency would show up differently on different uarchs or nondeterministically in general). @fitzgen mind giving fecbc22 a look and re-r+'ing if OK?
(That commit should properly fix #3310 once it merges as well.)