walrus
walrus copied to clipboard
Start tracking the debug information origin for each instruction
This is intended to be a tracking issue for now that's somewhat light on the details, but I wanted to make sure we had one tagged as sprint for our first 6 week sprint!
@fitzgen do you want to fill in some more particulars here?
I'm imagining that each instruction would have something like this associated with it:
enum Whatever {
NoDebugInfo,
DebugInfoForFileOffset(u64),
DebugInfo(DebugInfoId),
}
We would have methods for providing this info whenever creating new instructions.
The DebugInfoForFileOffset variant just keeps track of the instruction's (perhaps transitive) origin in an input wasm file.
As we emit wasm, we read the input DWARF and move from DebugInfoForFileOffset to the DebugInfo variant.
The gimli writer api already has arenas and ids, so we can either newtype all that stuff or just expose and use it directly.
This is all pretty half-baked. I also have been meaning to look closer at how llvm structure's its debug info APIs.
I have started playing with walrus + DWARF and while I am now able to obtain the DWARF info for each InstrLocId, I still have to manually figure out the mapping between the DWARF offsets (with zero offset meaning the location where the function's body starts, so usually where its locals are defined) and the InstrLocId::data() which gives me the offset to the start of the wasm blob.
Is the idea to take the offset at the beginning of decoding a function and make the InstrLocId be relative to that? Or should we keep InstrLocId as it is and instead record the beginning byte offset and expose it so users of InstrLocId can compute the offset?
Are there any updates? I am curious to hear if this is moving forward and if we can soon start debugging Rust code in Chrome via DWARF debugging information.
I'm still trying to figure out the base address so I can map DWARF offsets to instruction byte addresses. The problem I have is that I don't know enough about it all to be sure that there's always just one section in DWARF for wasm and consequently only one base address. I could just grab the base address by looking at the code section's base address? Or is that just true in my current setup?
I'm guessing, but https://webassembly.github.io/spec/core/binary/modules.html#binary-codesec isn't clear on that. I would think that there are multiple DWARF sections, one for each module's code section? Can there even be multiple modules in one .wasm file? Nothing in walrus' documentation hints at that being possible
@oli-cosmian the "base address" within the WASM DWARF is basically 0. The offset of the code section is only added when addresses are used an the external interface.
So for instance 0x10 as address for a function in DWARF with the code section offset being 0x400 would be reported in stack traces as 0x410 by browser runtimes for instance. Multiple modules in one .wasm file would be impossible at least as far as DWARF is concerned.
Sorry if I may seem impatient but is there any update on this? I would be interested to try and work on this. Only problem is that I have very little experience manipulating WASM or executables in general. Any pointers would be appreciated.
Any updates on this?
We're building a WebGL render engine with rust-bindgen and we have about 100k lines of code in rust. Debuging with logging is OK but... it always feels like an essential tool is missing when there are new guys joining the team and trying to understand the runtime behaviors. It would be a greate help for the productivity for a project of this size if we can make the debugging funcitonality work.
I would love to help, but I don't understand the background of the problem here. Could someone shed some lights on it? Maybe we can solve the problem if we have enough people trying together.
I just noticed @nokaten is working on a promising one: https://github.com/rustwasm/walrus/pull/231
Just cross-posting from the other thread, the above-mentioned work landed here https://github.com/rustwasm/walrus/pull/244