wazero
wazero copied to clipboard
Implement DWARF parser for better backtraces
Background
LLVM-based compilers for Wasm, for examples C/C++, Rust, Zig, TinyGo (virtually 100% of viable languages),
emit DWARF information into .debug_* custom sections. The following is the sections contained in a TinyGo binary:
$ wasm-objdump main.wasm -h
main.go.wasm: file format wasm 0x1
Sections:
Type start=0x0000000b end=0x00000158 (size=0x0000014d) count: 42
Import start=0x0000015b end=0x000003df (size=0x00000284) count: 18
Function start=0x000003e2 end=0x000004e8 (size=0x00000106) count: 260
Table start=0x000004ea end=0x000004ef (size=0x00000005) count: 1
Memory start=0x000004f1 end=0x000004f4 (size=0x00000003) count: 1
Global start=0x000004f6 end=0x000004fe (size=0x00000008) count: 1
Export start=0x00000501 end=0x000007ac (size=0x000002ab) count: 31
Code start=0x000007b0 end=0x0001b258 (size=0x0001aaa8) count: 260
Data start=0x0001b25c end=0x00020862 (size=0x00005606) count: 2
Custom start=0x00020866 end=0x00034e9a (size=0x00014634) ".debug_info"
Custom start=0x00034e9d end=0x00035f66 (size=0x000010c9) ".debug_pubtypes"
Custom start=0x00035f6a end=0x000431a7 (size=0x0000d23d) ".debug_loc"
Custom start=0x000431aa end=0x00044f60 (size=0x00001db6) ".debug_ranges"
Custom start=0x00044f62 end=0x00044fa1 (size=0x0000003f) ".debug_aranges"
Custom start=0x00044fa4 end=0x00046ef6 (size=0x00001f52) ".debug_abbrev"
Custom start=0x00046efa end=0x00059503 (size=0x00012609) ".debug_line"
Custom start=0x00059507 end=0x0006510b (size=0x0000bc04) ".debug_str"
Custom start=0x0006510f end=0x0006bf6b (size=0x00006e5c) ".debug_pubnames"
Custom start=0x0006bf6e end=0x0006e6e8 (size=0x0000277a) "name"
Custom start=0x0006e6eb end=0x0006e778 (size=0x0000008d) "producers"
By reading debug sections, we can associate "each wasm instruction" in functions to a specific line of a source code which the binary is compiled from.
Why?
Some of the de-facto Wasm tools have already supported the DWARF format. For example Google Chrome[3] has allowed users to debug Wasm programs on the browser. Another example is Wasmtime -- when you run the panic example in this repo with WASMTIME_BACKTRACE_DETAILS=1, you can see the backtrace with source code info mation:
$ WASMTIME_BACKTRACE_DETAILS=1 wasmtime run examples/wasm/trap.wasm --invoke cause_panic
panic: causing panic!!!!!!!!!!
Error: failed to run main module `examples/wasm/trap.wasm`
Caused by:
0: failed to invoke `cause_panic`
1: wasm trap: unreachable
wasm backtrace:
0: 0x92a - runtime.abort
at /usr/local/lib/tinygo/src/runtime/runtime_tinygowasm.go:63:6
- runtime._panic
at /usr/local/lib/tinygo/src/runtime/panic.go:13:7
1: 0x9ba - main.three
at /home/mathetake/gasm/examples/wasm/trap.go:19:7
2: 0x9b0 - main.two
at /home/mathetake/gasm/examples/wasm/trap.go:15:7
3: 0x9a6 - main.one
at /home/mathetake/gasm/examples/wasm/trap.go:11:5
4: 0x99c - cause_panic
at /home/mathetake/gasm/examples/wasm/trap.go:7:5
On the other hand, at the moment of this writing, our backtrace is not using DWARF, but just parsing "name" custom sections and attach each function name:
panic: causing panic!!!!!!!!!!
wasm runtime error: unreachable
wasm backtrace:
0: runtime._panic
1: main.three
2: main.two
3: main.one
4: cause_panic
This will be much more useful when users run non-TinyGo Wasms -- usually the function names are mangled by compilers (luckily TinyGo does not!) so they are basically not human-readable. For example, Rust binary's backtrace with custom sections would look like this:
0: 0x42deb - __rust_start_panic
1: 0x42c0c - rust_panic
2: 0x42882 - _ZN3std9panicking20rust_panic_with_hook17h072472ae3822b936E
3: 0x32914 - _ZN3std9panicking11begin_panic28_$u7b$$u7b$closure$u7d$$u7d$17hed88036b12f483dfE
4: 0x34891 - _ZN3std10sys_common9backtrace26__rust_end_short_backtrace17h9133fcc3e85035deE
5: 0x32810 - _ZN3std9panicking11begin_panic17he6f6e918174263cfE
6: 0x39eb - _ZN77_$LT$http_headers..HttpHeaders$u20$as$u20$proxy_wasm..traits..HttpContext$GT$6on_log17hde90e85ea16e616eE
7: 0x2ae53 - _ZN10proxy_wasm10dispatcher10Dispatcher6on_log17hc6cd4fb35c538b86E
8: 0x2d3dd - _ZN10proxy_wasm10dispatcher12proxy_on_log28_$u7b$$u7b$closure$u7d$$u7d$17h3f864ec735f41e70E
9: 0x311bd - _ZN3std6thread5local17LocalKey$LT$T$GT$8try_with17hc87d8e9cf2d2494cE
With DWARF information, we don't need to parse "name" custom section therefore we won't suffer this mangled dirty symbols and instead we can emit each trace with human-readable function names plus source code info.
How?
Wasm DWARF format[1] is almost same as the standard DWARF specification version 5?[2] with the difference where the address should be interpreted as an offset from the beginning of "the code section" vs the beginning of "the binary" in non-Wasm format.
So it should be simple to write parser by getting insights from other DWARF implementations.
Links
[1] https://yurydelendik.github.io/webassembly-dwarf/ [2] https://dwarfstd.org/doc/DWARF5.pdf pdf! [3] https://twitter.com/ChromeDevTools/status/1192803818024710145
Hi @mathetake Does this issue still relevant? Looking to contribute and landed with it
@r8d8 I think this definitely is still relevant, and we'd want to start with backtrace enhancement. If I understand you correctly, you are interested in contributing this? If so, I'd recommend starting small as this may touch a few different spots and iterating small can give you less burden especially in an area that is not 100pct defined in spec.
@mathetake https://github.com/yurydelendik/webassembly-dwarf is abandoned and the author isn't replying to issues anymore. We should ask the actual spec about this and cite something with a future to avoid compatibility drift, possibly asking other implementers which "specs" they plan to use. We can tentatively use the dead one of course, but some time before 1.0 we need to firm this up. wdyt?
DWARF in Wasm is added in tool-conventions which says:
These conventions are not part of the WebAssembly standard, and are not required of WebAssembly-consuming implementations to execute WebAssembly code. Tools producing and working with WebAssembly in other ways also need not follow any of these conventions. They exist only to support tools that wish to interoperate with other tools at a higher abstraction level than just WebAssembly itself.
Meaning there won't be any formal specification, but instead we have to follow the (personally hosted) specification (https://yurydelendik.github.io/webassembly-dwarf/). So we’ll have to choose a way and possibly compare against another implementation. Fortunately the implementation is stable in the sense that major wasm runtimes and compilers implement it (clang/LLVM, wasmtime, V8).
As for contribution, I think this could be multiple weeks or even months of full-time work. This includes; implementing a binary parser for DWARF 5 (note that there's nothing we can reference or use in the exiting Go ecosystem meaning that we have to implement literally from scratch), refactor the JIT compiler and interpreter so they can track original Wasm instruction address to the our runtime representation), etc.
That said, I would recommend as @codefromthecrypt suggested to start small rather than an overwhelming one like this.
@codefromthecrypt @mathetake Thanks for your reply. Will take a look into WASI support direction.
As a general thought, there were DWARF 5 pieces added to the main Go tool chain a while back:
- https://go-review.googlesource.com/c/go/+/175137/
- https://go-review.googlesource.com/c/go/+/175138/2
So, things might not be completely from scratch. :smile:
Oh that's cool! Thank you for the info! @justinclift