design
design copied to clipboard
Source maps applied to wasm binaries
This PR proposes applying source maps to wasm binaries. Source maps obviously have a lot of known limitations and are not a long-term debugging solution. However they are widely supported (including by emscripten for asm.js code) and based on conversations with Chrome and Firefox dev tools folks, my hope is that it will be easy and straightforward to adapt the existing dev tools code to apply to wasm binaries in the meantime, and bring wasm to parity with asm.js.
Couldn't the wasm be disassembled to wast textual format and the links to the original source be attached to lines in the wast (rather than an offset into the wasm binary)? Or perhaps both?
@kanaka IIRC In Chrome and Firefox, currently when displaying wast (instead of original source) the JS engine disassembles the wasm binary and passes the wast text to the devtools. Using source maps would mean the source would be displayed instead of the wast. So in that case you wouldn't need the wast at all. Are you suggesting that the wast be generated offline, and then passed to the browser along with a source map? In that case we'd still need a way to link from the wasm binary to the wast. In that sense the wast would be sort of like a mapped source for the binary. But it wouldn't really save much because browsers would probably still want to have some support for the case where there is nothing but the binary. Which means they'd probably still have to support disassembling it themselves, so I would guess that passing the disassembly from the outside wouldn't really save anything or make things any simpler.
I was thinking a binary source map format designed for WASM specifically would be more desirable than re-using the JavaScript format. But I'm not a browser vendor so I'm not sure how troublesome that might be.
@RyanLamansky I agree that in principle a source-mapping format meant for wasm would not look like JavaScript source maps. However the intention of this convention is just to capture some really low-hanging fruit, because what we ultimately want is something much more capable than sourcemaps, and that will either have to be designed independently, or re-used from some other existing format (e.g. DWARF). So in one sense, this isn't worth doing at all if it isn't really simple, because it will be superseded, hopefully before too long. Therefore it should fit best with what browsers or devtools are already doing with source maps, which means that it would be nice not to require them to parse a format which is new, but also not the final form that we want.
Could you also update https://github.com/WebAssembly/design/blob/master/FutureFeatures.md#source-maps-integration ?
Pinging this back to life :-) Looks like this is waiting on some fixes from Derek.
OK, I think have addressed the comments, although I'm not 100% sure if @domenic had anything more in mind beyond what I did about URLs and resolution.
It seems reasonable given the general level of detail here. Thanks!
Demo compiled using emscripten that is using source maps https://yurydelendik.github.io/sqlite-playground/src/playground.html (source at https://github.com/yurydelendik/sqlite-playground)
I'm wondering if we should land this patch in the source map spec rather than in the WebAssembly spec. I filed an issue in the source map spec's tracker to follow this. JavaScript and CSS leave source maps to be described in the source map spec, so why should WebAssembly be different here?
I started the discussion here to get feedback from wasm folks but I agree that it makes more sense to go in the source map spec, and wasm shouldn't be different from JS.
source map spec's tracker
FYI, that RFC repo never caught on, and I ultimately could never get anyone (authors of compilers targeting JS, browser devtools implementors) to participate at the time.
Since source maps are a stop gap for now, but ultimately a dead end, I'd suggest two things:
-
Just implementing and documenting the column=byte offset thing.
-
Starting a new effort to create a real debugging format (or adopting DWARF). I would be happy to participate in such an effort.
@fitzgen Since sourcemaps are web reality, and it'd likely hurt debugging in the real world if we just removed them in favor of a new version, maybe we should both strengthen the specification and conformance tests for existing sourcemaps as well as adding a v4 version. We'll need to get all the stakeholders involved whether we just do a new version or whether we also improve the specification for the previous version, anyway.
Would people be interested in getting together for a call to discuss rebooting the sourcemaps spec?
cc @foolip @ak239 @concavelenz
I have no previous involvement with sourcemaps, maybe @mathiasbynens knows who might know?
I'm definitely interested in moving debugging forward. This particular proposal is just a small adaptation to the existing scope and functionality of source maps (i.e. as purely a line-table facility but not full debug info), so I'm happy to put it wherever people thing it belongs, and contribute tests and docs where appropriate.
For "real" debugging, that's an entirely new class of capabilities (even in DWARF, line tables are a separate section and fairly independent from the rest of the debug info). If there's a new data format (regardless of whether it's sourcemap-like or DWARF-like or something else) it could be pretty independent too.
Beyond that though, I'm actually not fully convinced that a file format is even the best thing to standardize on as the primary interoperability layer; something like an API or protocol might have some advantages; but I guess exactly what that means can be discussed somewhere other than this PR.
As the author of Scala.js, I would be interested to be part of discussions about improving source maps. In our community we are painfully aware of a number of shortcomings of source maps.
On DevTools side we think that current static format of source maps can be not suitable for WebAssembly and we are looking for some dynamic format. I can imagine some DWARF based ideas, as well as something similar with language services or adapter over well supported by industry DevTools protocol.
Would folks here be interested in attending a video call in the next month or two to get started on improving debugging metadata, for both Wasm and JS?
I would be interested @littledan, even if I didn't participate much here?
@littledan In my own internal planes for this year, is participating in adding source map and debugging to wasm in JSC, so I would be interested. Are there any free places left?
I agree with @dschuff. Instead of a file format, we should standard debugging protocol API (similar to JVMTI or JDWP) which would give programmatic access to the WASM-machine-level state and give control for stopping, stepping, etc. Then DWARF and any other file format can be processed in a "user space" tool that just interacts with the WebAssembly engine via this standardized protocol.
IMO this has the advantage of keeping the WebAssembly engine completely independent from any particular debugging format, making it maximally future-proof. (I also think it would be the simplest thing to implement at the engine layer).
I tend to agree that specifying this kind of debugging protocol API seems to make sense, but I know that @fitzgen had some good arguments against it that'd need to be discussed. I'm quite sure I'd butcher his arguments, so I won't try to represent them here.
Meanwhile, @yurydelendik is working on extensions to DWARF that would make it work for Wasm, with the intent to embed it into custom sections and synthesizing source maps from them on the fly. This is obviously not something we'll want to standardize in this form - or probably even ship in a release build. It is a good step towards validating the viability of using DWARF as the format for representing this information.
I would like to join video call if it's possible
Hello all!
I am a little concerned that when the general subject of "debug info for wasm" comes up, folks are only thinking of the stepping debugger use case.
Here is a concrete list of debug info I wished that wasm had when I was writing tooling for wasm or using wasm tooling:
-
Which logical functions are inlined within a given physical function? Within the function's body, which byte ranges can be attributed to each inlined function, and which to the physical function itself?
-
My first use case for this information is in the
twiggycode size profiler, where I want to find large functions that are getting inlined many times, and might be causing bloat. -
My second use case is: when profiling WebAssembly code, sampled stack frames are much less useful than their JavaScript counterparts, because only physical frames are displayed. Profilers should display inlined function's frames as well, like they do for JavaScript (with assistance from the JS engine, rather than debug info).
-
As a precedent, this information is present in DWARF's
.debug_infosection, as aDW_TAG_inlined_subroutineentry in the debugging information entry (DIE) tree:< 4><0x00000454> DW_TAG_subprogram DW_AT_low_pc 0x00016d30 DW_AT_high_pc <offset-from-lowpc>635 DW_AT_frame_base len 0x0001: 56: DW_OP_reg6 DW_AT_linkage_name _ZN12cpp_demangle3ast11BuiltinType8demangle17h174c82f68696be09E DW_AT_name demangle<&mut alloc::vec::Vec<u8>> DW_AT_decl_file 0x00000001 /home/fitzgen/cpp_demangle/src/ast.rs DW_AT_decl_line 0x00000c35 DW_AT_type <0x00016a94> ... < 6><0x000004ca> DW_TAG_lexical_block DW_AT_low_pc 0x00016d51 DW_AT_high_pc <offset-from-lowpc>596 < 7><0x000004d7> DW_TAG_variable DW_AT_location <loclist at offset 0x00009e83 with 1 entries follows> [ 0]< offset pair low-off : 0x00016d4b addr 0x00016d4b high-off 0x00016d59 addr 0x00016d59>DW_OP_breg4+1 DW_OP_stack_value DW_AT_name ty DW_AT_alignment 0x00000001 DW_AT_decl_file 0x00000001 /home/fitzgen/cpp_demangle/src/ast.rs DW_AT_decl_line 0x00000c40 DW_AT_type <0x00018731> < 7><0x000004e8> DW_TAG_inlined_subroutine DW_AT_abstract_origin <0x00000636> DW_AT_low_pc 0x00016d51 DW_AT_high_pc <offset-from-lowpc>596 DW_AT_call_file 0x00000001 /home/fitzgen/cpp_demangle/src/ast.rs DW_AT_call_line 0x00000c40 < 8><0x000004fc> DW_TAG_inlined_subroutine DW_AT_abstract_origin <0x00016893> DW_AT_low_pc 0x00016f63 DW_AT_high_pc <offset-from-lowpc>58 DW_AT_call_file 0x00000003 /home/fitzgen/cpp_demangle/<write macros> DW_AT_call_line 0x00000002 < 9><0x0000050f> DW_TAG_formal_parameter DW_AT_location len 0x0007: 930810019f9308: DW_OP_piece 8 DW_OP_constu 1 DW_OP_stack_value DW_OP_piece 8 DW_AT_abstract_origin <0x000168a4> < 9><0x0000051c> DW_TAG_formal_parameter DW_AT_location <loclist at offset 0x00009ea8 with 2 entries follows> [ 0]< offset pair low-off : 0x00016f63 addr 0x00016f63 high-off 0x00016f8d addr 0x00016f8d>DW_OP_piece 8 DW_OP_constu 1 DW_OP_stack_value DW_OP_piece 8 [ 1]< offset pair low-off : 0x00016f8d addr 0x00016f8d high-off 0x00016fab addr 0x00016fab>DW_OP_breg6-16 DW_OP_piece 8 DW_OP_constu 1 DW_OP_stack_value DW_OP_piece 8 DW_AT_abstract_origin <0x000168af> < 9><0x00000525> DW_TAG_formal_parameter DW_AT_location len 0x0007: 930810019f9308: DW_OP_piece 8 DW_OP_constu 1 DW_OP_stack_value DW_OP_piece 8 DW_AT_abstract_origin <0x000168ba> ...
-
-
Is a given function a monomorphization of some generic function?
-
Again, I want this information in
twiggy, so I can try and identify generic functions that have been monomorphized "too many" times and are leading to code bloat. -
DWARF doesn't represent the generic functions themselves, but does have
DW_TAG_template_{type,value}_parameterentries, so you can mostly reconstruct this information.
-
-
Mapping source locations to wasm byte codes and back.
-
The one thing source maps can do :)
-
DWARF's equivalent is in the
.debug_linesection. -
For
twiggy, I want to say where in the source call graph edges are originating from. If you expectblah_functionto be dead code, and removed by the linker, but it isn't, I want to be able to point at a series of "src/foo.rs:12:34 inlolo_method" as the reasons why theblah_functionis not dead code and is included in the.wasmbinary.
-
I want to highlight that this is just the concrete use cases I have run into so far. As we do more with wasm, this list will only grow. Any kind of instrumentation or static analysis tooling is going to want to consume debug info.
Why do I want to highlight debug info consumer use cases besides stepping debuggers? Because I want to make sure that you don't have to be running the debuggee program just to get information about it. A size profiler, a static analyzer, and an instrumentation-based code coverage tool shouldn't have to run the .wasm binary they are working on.
@littledan I would love to join a video call to discuss this topic :)
Let's set up a video call. Are people available next week? I will set up a doodle if I get enough thumbs up. This would be the week of May 1-5 @fitzgen @littledan @xtuc @chicoxyzzy @gskachkov pinging you specifically since you expressed interest. everyone else is also invited!
I am also very interested, but already overbooked until May 8 included.
We can expand to the week of May 7-11 (thumbs up on this comment if this works better for you)
Hi there, there is small question about wasm debug: As wasm can be run on mobile devices and we can't run DevTool there, so wasm should support some kind remote debug protocol, that allows debug wasm modules on mobile devices. Will remote debug protocol be part of the wasm debug specification or it will be handled by each browser by own remote debugging protocol?
@codehag please invite @yurydelendik to this call, too.