debugging
debugging copied to clipboard
Understanding DWARF+WebAssembly offsets
Working on binaryen support for DWARF, I realized I don't know how to read the line info data. The main issues are:
- The code addresses doc says offsets are
the offset of an instruction relative within the Code section of the WebAssembly file. Does "the Code section" include the entire code section, with the0xa0byte to declare the code section and the LEB for the length? Or just the body, without those? - Can debug lines refer to code section offsets that are not code? (Like the function declarations.)
- Can debug lines refer to inner parts of an instruction, and not the start?
In more detail here is what I am trying: I started with @yurydelendik 's fib2 sample,
__attribute__((used))
int fib(int n) {
int i, t, a = 0, b = 1;
for (i = 0; i < n; i++) {
t = a;
a = b;
b += t;
}
return b;
}
and I build it with
clang fib2.c -O3 -g -o fib2.clang.wasm -target wasm32-unknown-emscripten -nostdlib -Wl,--no-entry
LLVM's dwarfdump says this:
Address Line Column File ISA Discriminator Flags
------------------ ------ ------ ------ --- ------------- -------------
0x0000000000000002 2 0 1 0 0 is_stmt
0x000000000000000b 4 17 1 0 0 is_stmt prologue_end
0x0000000000000010 4 3 1 0 0
0x0000000000000012 0 3 1 0 0
0x000000000000001e 7 7 1 0 0 is_stmt
0x0000000000000025 0 7 1 0 0
0x0000000000000029 4 17 1 0 0 is_stmt
0x000000000000002e 4 3 1 0 0
0x0000000000000034 9 3 1 0 0 is_stmt
0x0000000000000037 9 3 1 0 0 is_stmt end_sequence
The first line there says address 2. If the offset is in the code section body, then that's in the middle of the function declaration, and not executable code. Is that expected?
The fifth line has address 0x1e. Looking in the binary, though, the code section's body starts at 0x2d, and adding the offset we get 0x4b. That is the second out of 2 bytes of an i32.const -1, which seems odd?
Also, when I load the wasm in the code explorer, it only shows 3 lines in the UI (2, 4, 7) while the debug line table also mentions line 9. Looking at that line 9 info, it starts at 0x34 which, relative to the start of the code section's body, is at 0x61 - which is past the end of the code section..?
cc @dschuff @yurydelendik
the offset of an instruction relative within the Code section of the WebAssembly file
Code section starts at the its function count LEB. There are several decision that led to it:
- We can potentially point to function locals bytes (see related response below), it is decided that it is better to start way before first function len LEB.
- No valid DWARF offset shall be 0 or range start from 0. We reserving that for dead symbols: when linker cannot relocate entry, it places 0 in the .debug_info or .debug_line table.
- The WASM files can be potentially manipulated to remove sections (and rewrite section header), so the decisions were made to make DWARF code offsets relative to the actual code section start.
Can debug lines refer to code section offsets that are not code?
In theory, yes. .debug_info will have ranges that point to entire function body. At the debugger side, "PC" pointing at locals bytes may signal entering frame. It is not used atm, we can change that requirement and use only offsets that point only to code section body/instructions.
Can debug lines refer to inner parts of an instruction, and not the start?
Not sure DWARF does have a requirement to point only to the start of the instruction.
The relocation section will definitely is capable to point to inner parts of an instruction.
Thanks @yurydelendik !
No valid DWARF offset shall be 0 or range start from 0. We reserving that for dead symbols: when linker cannot relocate entry, it places 0 in the .debug_info or .debug_line table.
Interesting, why not just drop that line then, seems like it won't be usable later anyhow? Or is there some other use for the information?
Not sure DWARF does have a requirement to point only to the start of the instruction.
It would require some additional logic in binaryen to support that. I was hoping not to need it...
Interesting, why not just drop that line then
the lld cannot parse, optimize or re-write DWARF data due to its complexity. @sbc100 , is it correct?
seems like it won't be usable later anyhow?
It is not useful. Notice that .debug_line encodes only few offsets, and rest of them are deltas. That means delta becomes invalid/dead as well.
It would require some additional logic in binaryen to support that.
Agree. We can recommend that for WebAssembly DWARF.
On Thu, Dec 19, 2019 at 3:45 PM Yury Delendik [email protected] wrote:
Interesting, why not just drop that line then
the lld cannot parse, optimize or re-write DWARF data due to its complexity. @sbc100 https://github.com/sbc100 , is it correct?
Correct, the linker doesn't do anything to DWARF info other than concatenate it. This is by design.
seems like it won't be usable later anyhow?
It is not useful. Notice that .debug_line encodes only few offsets, and rest of them are deltas. That means delta becomes invalid/dead as well.
It would require some additional logic in binaryen to support that.
Agree. We can recommend that for WebAssembly DWARF.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/WebAssembly/debugging/issues/9?email_source=notifications&email_token=AAD55ZLWDPKBG2JUDW43X2TQZQBQFA5CNFSM4J5R23F2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHLNEIA#issuecomment-567726624, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAD55ZJTSKWQDKXADS5H4MDQZQBQFANCNFSM4J5R23FQ .
Thank you for the explanation @yurydelendik!
I'm trying to parse DWARF line info for https://github.com/turbolent/w2c2 and had the same questions after reading the spec, i.e. where the start of the code section is, and if it is normal that sometimes line addresses point to the middle of instructions. Maybe it is worth to document this better in the spec?
I'm still a bit confused about the last part, addresses pointing to the middle of instructions. Why not require alignment?