design icon indicating copy to clipboard operation
design copied to clipboard

Source maps applied to wasm binaries

Open dschuff opened this issue 8 years ago • 50 comments

This PR proposes applying source maps to wasm binaries. Source maps obviously have a lot of known limitations and are not a long-term debugging solution. However they are widely supported (including by emscripten for asm.js code) and based on conversations with Chrome and Firefox dev tools folks, my hope is that it will be easy and straightforward to adapt the existing dev tools code to apply to wasm binaries in the meantime, and bring wasm to parity with asm.js.

dschuff avatar May 02 '17 01:05 dschuff

Couldn't the wasm be disassembled to wast textual format and the links to the original source be attached to lines in the wast (rather than an offset into the wasm binary)? Or perhaps both?

kanaka avatar May 02 '17 02:05 kanaka

@kanaka IIRC In Chrome and Firefox, currently when displaying wast (instead of original source) the JS engine disassembles the wasm binary and passes the wast text to the devtools. Using source maps would mean the source would be displayed instead of the wast. So in that case you wouldn't need the wast at all. Are you suggesting that the wast be generated offline, and then passed to the browser along with a source map? In that case we'd still need a way to link from the wasm binary to the wast. In that sense the wast would be sort of like a mapped source for the binary. But it wouldn't really save much because browsers would probably still want to have some support for the case where there is nothing but the binary. Which means they'd probably still have to support disassembling it themselves, so I would guess that passing the disassembly from the outside wouldn't really save anything or make things any simpler.

dschuff avatar May 02 '17 22:05 dschuff

I was thinking a binary source map format designed for WASM specifically would be more desirable than re-using the JavaScript format. But I'm not a browser vendor so I'm not sure how troublesome that might be.

RyanLamansky avatar May 02 '17 23:05 RyanLamansky

@RyanLamansky I agree that in principle a source-mapping format meant for wasm would not look like JavaScript source maps. However the intention of this convention is just to capture some really low-hanging fruit, because what we ultimately want is something much more capable than sourcemaps, and that will either have to be designed independently, or re-used from some other existing format (e.g. DWARF). So in one sense, this isn't worth doing at all if it isn't really simple, because it will be superseded, hopefully before too long. Therefore it should fit best with what browsers or devtools are already doing with source maps, which means that it would be nice not to require them to parse a format which is new, but also not the final form that we want.

dschuff avatar May 03 '17 17:05 dschuff

Could you also update https://github.com/WebAssembly/design/blob/master/FutureFeatures.md#source-maps-integration ?

jfbastien avatar May 15 '17 18:05 jfbastien

Pinging this back to life :-) Looks like this is waiting on some fixes from Derek.

flagxor avatar Aug 23 '17 21:08 flagxor

OK, I think have addressed the comments, although I'm not 100% sure if @domenic had anything more in mind beyond what I did about URLs and resolution.

dschuff avatar Aug 24 '17 18:08 dschuff

It seems reasonable given the general level of detail here. Thanks!

domenic avatar Aug 24 '17 18:08 domenic

Demo compiled using emscripten that is using source maps https://yurydelendik.github.io/sqlite-playground/src/playground.html (source at https://github.com/yurydelendik/sqlite-playground)

yurydelendik avatar Oct 13 '17 17:10 yurydelendik

I'm wondering if we should land this patch in the source map spec rather than in the WebAssembly spec. I filed an issue in the source map spec's tracker to follow this. JavaScript and CSS leave source maps to be described in the source map spec, so why should WebAssembly be different here?

littledan avatar Mar 05 '18 22:03 littledan

I started the discussion here to get feedback from wasm folks but I agree that it makes more sense to go in the source map spec, and wasm shouldn't be different from JS.

dschuff avatar Mar 06 '18 19:03 dschuff

source map spec's tracker

FYI, that RFC repo never caught on, and I ultimately could never get anyone (authors of compilers targeting JS, browser devtools implementors) to participate at the time.

Since source maps are a stop gap for now, but ultimately a dead end, I'd suggest two things:

  1. Just implementing and documenting the column=byte offset thing.

  2. Starting a new effort to create a real debugging format (or adopting DWARF). I would be happy to participate in such an effort.

fitzgen avatar Mar 06 '18 19:03 fitzgen

@fitzgen Since sourcemaps are web reality, and it'd likely hurt debugging in the real world if we just removed them in favor of a new version, maybe we should both strengthen the specification and conformance tests for existing sourcemaps as well as adding a v4 version. We'll need to get all the stakeholders involved whether we just do a new version or whether we also improve the specification for the previous version, anyway.

Would people be interested in getting together for a call to discuss rebooting the sourcemaps spec?

cc @foolip @ak239 @concavelenz

littledan avatar Mar 06 '18 22:03 littledan

I have no previous involvement with sourcemaps, maybe @mathiasbynens knows who might know?

foolip avatar Mar 16 '18 04:03 foolip

I'm definitely interested in moving debugging forward. This particular proposal is just a small adaptation to the existing scope and functionality of source maps (i.e. as purely a line-table facility but not full debug info), so I'm happy to put it wherever people thing it belongs, and contribute tests and docs where appropriate.

For "real" debugging, that's an entirely new class of capabilities (even in DWARF, line tables are a separate section and fairly independent from the rest of the debug info). If there's a new data format (regardless of whether it's sourcemap-like or DWARF-like or something else) it could be pretty independent too.

Beyond that though, I'm actually not fully convinced that a file format is even the best thing to standardize on as the primary interoperability layer; something like an API or protocol might have some advantages; but I guess exactly what that means can be discussed somewhere other than this PR.

dschuff avatar Mar 16 '18 17:03 dschuff

As the author of Scala.js, I would be interested to be part of discussions about improving source maps. In our community we are painfully aware of a number of shortcomings of source maps.

sjrd avatar Mar 16 '18 18:03 sjrd

On DevTools side we think that current static format of source maps can be not suitable for WebAssembly and we are looking for some dynamic format. I can imagine some DWARF based ideas, as well as something similar with language services or adapter over well supported by industry DevTools protocol.

alexkozy avatar Mar 16 '18 18:03 alexkozy

Would folks here be interested in attending a video call in the next month or two to get started on improving debugging metadata, for both Wasm and JS?

littledan avatar Mar 17 '18 16:03 littledan

I would be interested @littledan, even if I didn't participate much here?

xtuc avatar Mar 17 '18 18:03 xtuc

@littledan In my own internal planes for this year, is participating in adding source map and debugging to wasm in JSC, so I would be interested. Are there any free places left?

gskachkov avatar Mar 17 '18 22:03 gskachkov

I agree with @dschuff. Instead of a file format, we should standard debugging protocol API (similar to JVMTI or JDWP) which would give programmatic access to the WASM-machine-level state and give control for stopping, stepping, etc. Then DWARF and any other file format can be processed in a "user space" tool that just interacts with the WebAssembly engine via this standardized protocol.

IMO this has the advantage of keeping the WebAssembly engine completely independent from any particular debugging format, making it maximally future-proof. (I also think it would be the simplest thing to implement at the engine layer).

titzer avatar Mar 19 '18 09:03 titzer

I tend to agree that specifying this kind of debugging protocol API seems to make sense, but I know that @fitzgen had some good arguments against it that'd need to be discussed. I'm quite sure I'd butcher his arguments, so I won't try to represent them here.

Meanwhile, @yurydelendik is working on extensions to DWARF that would make it work for Wasm, with the intent to embed it into custom sections and synthesizing source maps from them on the fly. This is obviously not something we'll want to standardize in this form - or probably even ship in a release build. It is a good step towards validating the viability of using DWARF as the format for representing this information.

tschneidereit avatar Mar 22 '18 11:03 tschneidereit

I would like to join video call if it's possible

chicoxyzzy avatar Mar 30 '18 07:03 chicoxyzzy

Hello all!

I am a little concerned that when the general subject of "debug info for wasm" comes up, folks are only thinking of the stepping debugger use case.

Here is a concrete list of debug info I wished that wasm had when I was writing tooling for wasm or using wasm tooling:

  • Which logical functions are inlined within a given physical function? Within the function's body, which byte ranges can be attributed to each inlined function, and which to the physical function itself?

    • My first use case for this information is in the twiggy code size profiler, where I want to find large functions that are getting inlined many times, and might be causing bloat.

    • My second use case is: when profiling WebAssembly code, sampled stack frames are much less useful than their JavaScript counterparts, because only physical frames are displayed. Profilers should display inlined function's frames as well, like they do for JavaScript (with assistance from the JS engine, rather than debug info).

    • As a precedent, this information is present in DWARF's .debug_info section, as a DW_TAG_inlined_subroutine entry in the debugging information entry (DIE) tree:

      < 4><0x00000454>          DW_TAG_subprogram
                                  DW_AT_low_pc                0x00016d30
                                  DW_AT_high_pc               <offset-from-lowpc>635
                                  DW_AT_frame_base            len 0x0001: 56: DW_OP_reg6
                                  DW_AT_linkage_name          _ZN12cpp_demangle3ast11BuiltinType8demangle17h174c82f68696be09E
                                  DW_AT_name                  demangle<&mut alloc::vec::Vec<u8>>
                                  DW_AT_decl_file             0x00000001 /home/fitzgen/cpp_demangle/src/ast.rs
                                  DW_AT_decl_line             0x00000c35
                                  DW_AT_type                  <0x00016a94>
      ...
      < 6><0x000004ca>              DW_TAG_lexical_block
                                      DW_AT_low_pc                0x00016d51
                                      DW_AT_high_pc               <offset-from-lowpc>596
      < 7><0x000004d7>                DW_TAG_variable
                                        DW_AT_location              <loclist at offset 0x00009e83 with 1 entries follows>
                  [ 0]< offset pair low-off : 0x00016d4b addr  0x00016d4b high-off  0x00016d59 addr 0x00016d59>DW_OP_breg4+1 DW_OP_stack_value
                                        DW_AT_name                  ty
                                        DW_AT_alignment             0x00000001
                                        DW_AT_decl_file             0x00000001 /home/fitzgen/cpp_demangle/src/ast.rs
                                        DW_AT_decl_line             0x00000c40
                                        DW_AT_type                  <0x00018731>
      < 7><0x000004e8>                DW_TAG_inlined_subroutine
                                        DW_AT_abstract_origin       <0x00000636>
                                        DW_AT_low_pc                0x00016d51
                                        DW_AT_high_pc               <offset-from-lowpc>596
                                        DW_AT_call_file             0x00000001 /home/fitzgen/cpp_demangle/src/ast.rs
                                        DW_AT_call_line             0x00000c40
      < 8><0x000004fc>                  DW_TAG_inlined_subroutine
                                          DW_AT_abstract_origin       <0x00016893>
                                          DW_AT_low_pc                0x00016f63
                                          DW_AT_high_pc               <offset-from-lowpc>58
                                          DW_AT_call_file             0x00000003 /home/fitzgen/cpp_demangle/<write macros>
                                          DW_AT_call_line             0x00000002
      < 9><0x0000050f>                    DW_TAG_formal_parameter
                                            DW_AT_location              len 0x0007: 930810019f9308: DW_OP_piece 8 DW_OP_constu 1 DW_OP_stack_value DW_OP_piece 8
                                            DW_AT_abstract_origin       <0x000168a4>
      < 9><0x0000051c>                    DW_TAG_formal_parameter
                                            DW_AT_location              <loclist at offset 0x00009ea8 with 2 entries follows>
                  [ 0]< offset pair low-off : 0x00016f63 addr  0x00016f63 high-off  0x00016f8d addr 0x00016f8d>DW_OP_piece 8 DW_OP_constu 1 DW_OP_stack_value DW_OP_piece 8
                  [ 1]< offset pair low-off : 0x00016f8d addr  0x00016f8d high-off  0x00016fab addr 0x00016fab>DW_OP_breg6-16 DW_OP_piece 8 DW_OP_constu 1 DW_OP_stack_value DW_OP_piece 8
                                            DW_AT_abstract_origin       <0x000168af>
      < 9><0x00000525>                    DW_TAG_formal_parameter
                                            DW_AT_location              len 0x0007: 930810019f9308: DW_OP_piece 8 DW_OP_constu 1 DW_OP_stack_value DW_OP_piece 8
                                            DW_AT_abstract_origin       <0x000168ba>
      ...
      
  • Is a given function a monomorphization of some generic function?

    • Again, I want this information in twiggy, so I can try and identify generic functions that have been monomorphized "too many" times and are leading to code bloat.

    • DWARF doesn't represent the generic functions themselves, but does have DW_TAG_template_{type,value}_parameter entries, so you can mostly reconstruct this information.

  • Mapping source locations to wasm byte codes and back.

    • The one thing source maps can do :)

    • DWARF's equivalent is in the .debug_line section.

    • For twiggy, I want to say where in the source call graph edges are originating from. If you expect blah_function to be dead code, and removed by the linker, but it isn't, I want to be able to point at a series of "src/foo.rs:12:34 in lolo_method" as the reasons why the blah_function is not dead code and is included in the .wasm binary.

I want to highlight that this is just the concrete use cases I have run into so far. As we do more with wasm, this list will only grow. Any kind of instrumentation or static analysis tooling is going to want to consume debug info.

Why do I want to highlight debug info consumer use cases besides stepping debuggers? Because I want to make sure that you don't have to be running the debuggee program just to get information about it. A size profiler, a static analyzer, and an instrumentation-based code coverage tool shouldn't have to run the .wasm binary they are working on.

fitzgen avatar Apr 23 '18 18:04 fitzgen

@littledan I would love to join a video call to discuss this topic :)

fitzgen avatar Apr 23 '18 18:04 fitzgen

Let's set up a video call. Are people available next week? I will set up a doodle if I get enough thumbs up. This would be the week of May 1-5 @fitzgen @littledan @xtuc @chicoxyzzy @gskachkov pinging you specifically since you expressed interest. everyone else is also invited!

codehag avatar Apr 25 '18 08:04 codehag

I am also very interested, but already overbooked until May 8 included.

sjrd avatar Apr 25 '18 08:04 sjrd

We can expand to the week of May 7-11 (thumbs up on this comment if this works better for you)

codehag avatar Apr 25 '18 08:04 codehag

Hi there, there is small question about wasm debug: As wasm can be run on mobile devices and we can't run DevTool there, so wasm should support some kind remote debug protocol, that allows debug wasm modules on mobile devices. Will remote debug protocol be part of the wasm debug specification or it will be handled by each browser by own remote debugging protocol?

gskachkov avatar Apr 25 '18 10:04 gskachkov

@codehag please invite @yurydelendik to this call, too.

tschneidereit avatar Apr 25 '18 14:04 tschneidereit