debugging icon indicating copy to clipboard operation
debugging copied to clipboard

Understanding DWARF+WebAssembly offsets

Open kripken opened this issue 5 years ago • 6 comments

Working on binaryen support for DWARF, I realized I don't know how to read the line info data. The main issues are:

  • The code addresses doc says offsets are the offset of an instruction relative within the Code section of the WebAssembly file. Does "the Code section" include the entire code section, with the 0xa0 byte to declare the code section and the LEB for the length? Or just the body, without those?
  • Can debug lines refer to code section offsets that are not code? (Like the function declarations.)
  • Can debug lines refer to inner parts of an instruction, and not the start?

In more detail here is what I am trying: I started with @yurydelendik 's fib2 sample,

__attribute__((used))
int fib(int n) {
  int i, t, a = 0, b = 1;
  for (i = 0; i < n; i++) {
    t = a;
    a = b;
    b += t;
  }
  return b;
}

and I build it with

clang fib2.c -O3 -g -o fib2.clang.wasm  -target wasm32-unknown-emscripten -nostdlib -Wl,--no-entry

LLVM's dwarfdump says this:

Address            Line   Column File   ISA Discriminator Flags
------------------ ------ ------ ------ --- ------------- -------------
0x0000000000000002      2      0      1   0             0  is_stmt
0x000000000000000b      4     17      1   0             0  is_stmt prologue_end
0x0000000000000010      4      3      1   0             0 
0x0000000000000012      0      3      1   0             0 
0x000000000000001e      7      7      1   0             0  is_stmt
0x0000000000000025      0      7      1   0             0 
0x0000000000000029      4     17      1   0             0  is_stmt
0x000000000000002e      4      3      1   0             0 
0x0000000000000034      9      3      1   0             0  is_stmt
0x0000000000000037      9      3      1   0             0  is_stmt end_sequence

The first line there says address 2. If the offset is in the code section body, then that's in the middle of the function declaration, and not executable code. Is that expected?

The fifth line has address 0x1e. Looking in the binary, though, the code section's body starts at 0x2d, and adding the offset we get 0x4b. That is the second out of 2 bytes of an i32.const -1, which seems odd?

fib2.clang.wasm.zip

kripken avatar Dec 19 '19 22:12 kripken

Also, when I load the wasm in the code explorer, it only shows 3 lines in the UI (2, 4, 7) while the debug line table also mentions line 9. Looking at that line 9 info, it starts at 0x34 which, relative to the start of the code section's body, is at 0x61 - which is past the end of the code section..?

cc @dschuff @yurydelendik

kripken avatar Dec 19 '19 22:12 kripken

the offset of an instruction relative within the Code section of the WebAssembly file

Code section starts at the its function count LEB. There are several decision that led to it:

  • We can potentially point to function locals bytes (see related response below), it is decided that it is better to start way before first function len LEB.
  • No valid DWARF offset shall be 0 or range start from 0. We reserving that for dead symbols: when linker cannot relocate entry, it places 0 in the .debug_info or .debug_line table.
  • The WASM files can be potentially manipulated to remove sections (and rewrite section header), so the decisions were made to make DWARF code offsets relative to the actual code section start.

Can debug lines refer to code section offsets that are not code?

In theory, yes. .debug_info will have ranges that point to entire function body. At the debugger side, "PC" pointing at locals bytes may signal entering frame. It is not used atm, we can change that requirement and use only offsets that point only to code section body/instructions.

Can debug lines refer to inner parts of an instruction, and not the start?

Not sure DWARF does have a requirement to point only to the start of the instruction.

The relocation section will definitely is capable to point to inner parts of an instruction.

yurydelendik avatar Dec 19 '19 23:12 yurydelendik

Thanks @yurydelendik !

No valid DWARF offset shall be 0 or range start from 0. We reserving that for dead symbols: when linker cannot relocate entry, it places 0 in the .debug_info or .debug_line table.

Interesting, why not just drop that line then, seems like it won't be usable later anyhow? Or is there some other use for the information?

Not sure DWARF does have a requirement to point only to the start of the instruction.

It would require some additional logic in binaryen to support that. I was hoping not to need it...

kripken avatar Dec 19 '19 23:12 kripken

Interesting, why not just drop that line then

the lld cannot parse, optimize or re-write DWARF data due to its complexity. @sbc100 , is it correct?

seems like it won't be usable later anyhow?

It is not useful. Notice that .debug_line encodes only few offsets, and rest of them are deltas. That means delta becomes invalid/dead as well.

It would require some additional logic in binaryen to support that.

Agree. We can recommend that for WebAssembly DWARF.

yurydelendik avatar Dec 19 '19 23:12 yurydelendik

On Thu, Dec 19, 2019 at 3:45 PM Yury Delendik [email protected] wrote:

Interesting, why not just drop that line then

the lld cannot parse, optimize or re-write DWARF data due to its complexity. @sbc100 https://github.com/sbc100 , is it correct?

Correct, the linker doesn't do anything to DWARF info other than concatenate it. This is by design.

seems like it won't be usable later anyhow?

It is not useful. Notice that .debug_line encodes only few offsets, and rest of them are deltas. That means delta becomes invalid/dead as well.

It would require some additional logic in binaryen to support that.

Agree. We can recommend that for WebAssembly DWARF.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/WebAssembly/debugging/issues/9?email_source=notifications&email_token=AAD55ZLWDPKBG2JUDW43X2TQZQBQFA5CNFSM4J5R23F2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHLNEIA#issuecomment-567726624, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAD55ZJTSKWQDKXADS5H4MDQZQBQFANCNFSM4J5R23FQ .

sbc100 avatar Dec 20 '19 01:12 sbc100

Thank you for the explanation @yurydelendik!

I'm trying to parse DWARF line info for https://github.com/turbolent/w2c2 and had the same questions after reading the spec, i.e. where the start of the code section is, and if it is normal that sometimes line addresses point to the middle of instructions. Maybe it is worth to document this better in the spec?

I'm still a bit confused about the last part, addresses pointing to the middle of instructions. Why not require alignment?

turbolent avatar Feb 10 '22 16:02 turbolent