debugging icon indicating copy to clipboard operation
debugging copied to clipboard

Should DW_OP_Wasm_location's "index" field be an int32 so it can be easily relocatable?

Open aardappel opened this issue 5 years ago • 21 comments

DW_OP_Wasm_location currently has a type and index field that are both LEBs. For use with type TI_GLOBAL_START the index needs to be relocatable.

Doing LEB relocations in DWARF seems messy, as we need expanded LEBs that then (maybe) get re-compressed by a tool like Binaryen, if at all. int32 relocations are more common, simpler, and would better fit DWARF.

Why are we doing this at all? The first use case is having the Frame Base refer to __stack_pointer in case the function doesn't have an explicit Frame Base Local set. This is not super useful since it typically means the function has no shadow stack usage, but there may be other reasons to refer to globals from DWARF, so it be good to get this right?

For context: WebAssemblyFrameLowering::getDwarfFrameBase: https://github.com/llvm/llvm-project/blob/e8d67ada2df35ca6c70dbbe8185b0edbb18c1150/llvm/lib/Target/WebAssembly/WebAssemblyFrameLowering.cpp#L271-L274 Called from DwarfCompileUnit::updateSubprogramScopeDIE: https://github.com/llvm/llvm-project/blob/e8d67ada2df35ca6c70dbbe8185b0edbb18c1150/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp#L423-L432 Calls into DwarfExpression::addWasmLocation: https://github.com/llvm/llvm-project/blob/e8d67ada2df35ca6c70dbbe8185b0edbb18c1150/llvm/lib/CodeGen/AsmPrinter/DwarfExpression.cpp#L585-L591

This functionality originally introduced in https://reviews.llvm.org/D52634 https://reviews.llvm.org/D44184

Or maybe we should do this entirely differently? How does this affect unwinding by the debugger?

cc: @dschuff @sbc100 @kripken @yurydelendik @sunfishcode

aardappel avatar Mar 24 '20 19:03 aardappel

DW_OP_Wasm_location currently has a type and index field that are both LEBs.

I tried to keep it flexible. DW_OP_Wasm_location is just a prefix. The first LEB, type, can define next argument(s) and their types/format. So, in theory, we can add more e.g. TI_GLOBAL_START_U32.

yurydelendik avatar Mar 24 '20 19:03 yurydelendik

From Binaryen's point of view (and likely other tools doing DWARF post-processing) using int32s for such fields would be greatly preferred (almost all fields we need to update are such already).

kripken avatar Mar 24 '20 20:03 kripken

Ok, so if it's a flexible format, are we ok with it changing from a LEB to an int32?

Do we need any kind of backwards compatibility? There will currently be a single 0 byte, and in the future 4 zero bytes (for its current use). Old code expecting a LEB will thus sorta work if its ok with extra bytes?

aardappel avatar Mar 26 '20 15:03 aardappel

Ok, so if it's a flexible format, are we ok with it changing from a LEB to an int32?

I recommend to add another code in case if somebody already started writing tools. Also, it is nice to have compact form as well.

In LLVM, we can keepTI_GLOBAL_START can stay as is, except in DwarfExpression::addWasmLocation we will export TI_GLOBAL_START as int32 with new code.

yurydelendik avatar Mar 26 '20 16:03 yurydelendik

so we add a TI_GLOBAL_START_RELOC or similar?

aardappel avatar Mar 26 '20 16:03 aardappel

Why is the word START part of this identifier?

sbc100 avatar Mar 26 '20 16:03 sbc100

Why is the word START part of this identifier?

Good point. It was probably left from pilot -- we can remove "_START"s

yurydelendik avatar Mar 26 '20 17:03 yurydelendik

I can probably do that as part of my change :)

aardappel avatar Mar 26 '20 17:03 aardappel

/cc @pfaffe

dschuff avatar Mar 26 '20 22:03 dschuff

@yurydelendik

DW_OP_Wasm_location is just a prefix. The first LEB, type, can define next argument(s) and their types/format

We have definitions like: Descriptions[DW_OP_WASM_location] = Desc(Op::Dwarf4, Op::SizeLEB, Op::SignedSizeLEB); So I think I may have to introduce an additional WASM tag to make this work while retaining backwards compatibility.

DwarfExpression::addWasmLocation we will export TI_GLOBAL_START as int32 with new code

I don't see how. DwarfExpression can only build a limited set of DIE formats specific to expressions it seems, I need to be emitting something similar to what is happening in DwarfCompileUnit::addLocalLabelAddress, i.e. a DIELabel value that wraps a MCSymbol referring to __stack_pointer (which in turn has a reloc type of a global). I can emit something like DW_OP_WASM_location manually so I can emit the label, but that is basically emulating what DwarfExpression does, and very messy.

aardappel avatar Mar 26 '20 23:03 aardappel

That's unfortunate. It looks like it is LLVM's DwarfExpression limitation (and I don't have an answer for that yet).

Doing LEB relocations in DWARF seems messy, as we need expanded LEBs that then (maybe) get re-compressed by a tool like Binaryen, if at all.

Did we completely abandon the idea of using expanded LEBs here? That might not be as bad as redesigning LLVM and crating different Wasm tags.

yurydelendik avatar Mar 27 '20 14:03 yurydelendik

Did we completely abandon the idea of using expanded LEBs here? That might not be as bad as redesigning LLVM and crating different Wasm tags.

That is still an option I guess, but it doesn't get around all the issues. You still have the problem that DwarfExpression doesn't know how to write a DIELabel, so that still needs some reworking of the current abstraction, including non-wasm ones. And then additionally this DIELabel needs to be written as a LEB downstream, which I don't know how to accomplish yet.

aardappel avatar Mar 27 '20 15:03 aardappel

The more generic way for creation of relocatable DWARF expressions might be desired functionality similar e.g. for DW_OP_addr or DW_OP_addrx. (It does not look DIELabel is used in LLVM's DWARFExpression.cpp atm).

I saw similar request in the gimli-rs library.

yurydelendik avatar Mar 30 '20 18:03 yurydelendik

@yurydelendik Thanks! I see no existing use of DW_OP_addr in DwarfExpression, are you suggesting we add one? Similar to what happens in DwarfUnit::addOpAddress (https://github.com/llvm-mirror/llvm/blob/2c4ca6832fa6b306ee6a7010bfb80a3f2596f824/lib/CodeGen/AsmPrinter/DwarfUnit.cpp#L342-L354)

aardappel avatar Mar 30 '20 21:03 aardappel

I see no existing use of DW_OP_addr in DwarfExpression, are you suggesting we add one?

The implementing of DW_OP_addr is out of scope of this issue. Probably just to check if there are plans to do that.

yurydelendik avatar Mar 30 '20 21:03 yurydelendik

Ok, I've created a possible solution here: https://reviews.llvm.org/D77353

The output in .S is now:

	.int8	7                       # DW_AT_frame_base
	.int8	238                     # DW_OP_WASM_location_int
	.int8	3                       # TI_GLOBAL_RELOC
	.int32	__stack_pointer
	.int8	159                     # DW_OP_stack_value

Before it was:

	.int8	7                       # DW_AT_frame_base
	.int8	237                     # DW_OP_WASM_location
	.int8	1                       # TI_GLOBAL_FIXED
	.int8	0
	.int8	159                     # DW_OP_stack_value

Dwarfdump says: DW_AT_frame_base [DW_FORM_exprloc] (DW_OP_WASM_location 0x3 +0, DW_OP_stack_value) (only 1 changed to 3.

As discussed, the new code replicates functionality of what normally DwarfExpression does, but luckily the code is not particularly complicated. Making DwarfExpression be able to deal with labels would be an alternative way of implementing this, but this is complicated by the fact that DwarfExpression is fundamentally a list of different integer types, and the fact that it has two different paths to deal with buffering. Plumbing the ability to understand symbols thru this (which apparently no other target needs) would likely not be pretty.

Note that it would be nice to use the same code for the else block as well, make the code more uniform, and delete addWasmLocation and the use of DwarfExpression for constructing these entirely, if it wasn't for the fact that it is also used from DwarfDebug::emitDebugLocValue for locals.

I haven't tested this code beyond checking the above output, and modifying the tests that used type 1.. if this solution is acceptable, let me know if you think additional tests would be good.

Oh and then there's the small matters of not wanting to include WebAssembly.h in general code for TI_GLOBAL_RELOC, and maybe at some point wanting to factor out the hard-coded reference to __stack_pointer

aardappel avatar Apr 03 '20 00:04 aardappel

Final call for comments on this issue or https://reviews.llvm.org/D77353 :)

aardappel avatar Apr 09 '20 18:04 aardappel

Just to clarify, that the requirement of having int32 as an argument for DW_OP_Wasm_location comes from external tools (such as Binaryen?) and not from LLVM itself. Reading the patch at https://reviews.llvm.org/D77353 shows that we already had R_WASM_GLOBAL_INDEX_LEB to assist with DW_OP_Wasm_location wasm-global relocation at LLVM side (assuming LEB is padded).

The added R_WASM_GLOBAL_INDEX_I32 will make life easier for other tools, is it correct?

yurydelendik avatar Apr 09 '20 21:04 yurydelendik

Is there anything needs to be changed at https://github.com/yurydelendik/webassembly-dwarf/ to document the new encoding?

yurydelendik avatar Apr 09 '20 22:04 yurydelendik

@yurydelendik I am not sure if there is a requirement, I am simply going with the consensus earlier in this issue about what is a better representation. The primary consumer of this would be LLD I think.

And yes, we should probably document it, but it doesn't seem that it is final yet.

aardappel avatar Apr 13 '20 16:04 aardappel

Can people give their opinion on which tag encoding they prefer (see the end of https://reviews.llvm.org/D77353)

aardappel avatar Apr 13 '20 16:04 aardappel