binaryen icon indicating copy to clipboard operation
binaryen copied to clipboard

Support 5 segment source mappings

Open osa1 opened this issue 1 year ago • 3 comments

Support 5-segment source mappings.

Reference: https://github.com/tc39/source-map/blob/main/source-map-rev3.md#proposed-format

osa1 avatar Jul 31 '24 11:07 osa1

What are your plans for using this in practice? I'm not aware of other toolchains doing so, so I'm curious.

kripken avatar Aug 07 '24 18:08 kripken

When I created this PR I was using the name segment in dart2wasm to name generated code with the source code function that they belong.

However as you note, browsers don't use the name segments in source maps to give names to stack frames, and I'm unable to find where the name segments are used (if they actually are).

So currently I don't have a use case and this isn't a blocker for me. We may still want to merge it for completeness, the source maps spec allows this and there may be some toolchain that makes use of it.

osa1 avatar Aug 08 '24 07:08 osa1

I see, thanks for the update. In that case, I think I prefer not to merge this, if we can't test it against a known correct implementation, and don't have a current user. But we can leave the PR open, maybe there are other people that could benefit, and might find this and comment.

kripken avatar Aug 08 '24 17:08 kripken

@kripken we now have a use case for this, as described in https://github.com/dart-lang/sdk/issues/56718#issuecomment-2349373136.

The TL;DR is stack traces of crashes in optimized dart2wasm apps are logged in an external service which has access to the source map file.

The source map file is used to map code offsets in the stack trace to names.

Optimized dart2wasm apps don't have function names as names section can add ~70% overhead to binaries.

To support this use case we need to add names to mappings.

Would it be possible to merge this?

osa1 avatar Sep 16 '24 09:09 osa1

@osa1 If there is a use case then we should review and get this landed, I agree, but I don't think I understand the need there yet.

The standard solution I am aware of for that name problem is to store the name mapping on the side, which can be done by saving the name section in addition to the source map file. One way is to save the entire unstripped binary. Another approach is to save just the names themselves, for which e.g. wasm-opt --symbolmap can be used (that is what Emscripten does).

Is there a reason those existing solutions can't work for your use case?

kripken avatar Sep 16 '24 20:09 kripken

In the native world this is done via DWARF debugging information obtained when stripping the production binary (i.e. one obtains a single file that can be used to decode production stack traces). For wasm we don't use DWARF (as that disables certain binaryen optimizations) - so we instead rely on source maps for e.g. line number information.

If the method names came from a symbolmap file, we'd still need to have the source map file as well to get line numbers. It seems very inconvenient to have this metadata distributed across two different files - with different formats, where one needs different tools to decode different components of the frames and then merge this information.

mkustermann avatar Sep 16 '24 20:09 mkustermann

@mkustermann Two files does add some complexity, that is true. But in my experience on the Emscripten side that has not been much of an issue in practice. And as for needing separate tools, we have a small script that can call out to the various tools,

https://github.com/emscripten-core/emscripten/blob/main/emsymbolizer.py

That script can then handle a symbol map, a source map, or DWARF.

To be clear, I'm not saying I'm opposed to this PR. If you feel strongly that it is helpful for you then I can be convinced. I just want to make sure you have considered what I think is the simpler option of using the wasm names section as a symbol map.

kripken avatar Sep 16 '24 22:09 kripken

the simpler option of using the wasm names section as a symbol map

Do you mean simple for binaryen devs or end users? I think the simplicity here is for binaryen but not necessarily for the users of binaryen.

dart2wasm needs to pass the right flags to wasm-opt to generate symbol maps, then Flutter needs to pass the right flags to dart2wasm. (Flutter doesn't give users a way to pass extra compiler flags, so it needs to be updated)

End users will have to deal with 3 files instead of 2, and if they already had the implementation of mapping stack traces to source locations using a separately distributed source map file (as in our use case linked above) they won't be able to reuse much of their code.

osa1 avatar Sep 17 '24 08:09 osa1

@kripken Maybe as some background. The request originated due to sentry.io (a very popular crash reporting framework) to extending their flutter support to flutter web apps compiled to wasm (see sentry's bug: https://github.com/dart-lang/sdk/issues/56718). They probably have existing support for uploading source maps and things just work for dart2js code but not for dart2wasm code.

So the real complexity comes due to the fact that from the lowest level of the stack (the dart2wasm compiler), to flutter tooling, to sentry tooling to sentry API server uploading to sentry API server symbolization code to all need to deal with the extra wasm symbols map. So it may require e.g. sentry's crash reporting service to change their client tooling & server APIs to allow upload 3 files (wasm, source map, wasm symbolmap), change their server code that symbolizes stacks to use different tools to get different information (function names and line numbers) and then combine that.

And this isn't dart/flutter specific. Anyone using WasmGC & Binaryen will use source maps (because DWARF disables optimizations) and would then need to have to deal with this extra symbolmap file all the way up their stack.

It seems like the source map format supports the function names for that exact reason, so why not take advantage of it and avoid this complexity?

mkustermann avatar Sep 17 '24 08:09 mkustermann

Fair enough, as I said, if you feel strongly then I don't object. I'll find time later today to review this in detail.

kripken avatar Sep 17 '24 17:09 kripken

@kripken thanks, ptal.

osa1 avatar Sep 19 '24 14:09 osa1

Current version works as expected when I test it with https://dart-review.googlesource.com/c/sdk/+/386780, however there are a few TODOs I left in the code, which I'll try to fix next.

osa1 avatar Sep 25 '24 12:09 osa1

I believe this should be almost ready now, but I'm not sure how to test changes in module-utils.cpp.

@kripken do we have any tests right now that checks source map info after doing inlining, and after merging modules? The changes in module-utils.cpp will be used when inlining and merging modules, but I can't find where debug info sanity is tested after inlining and merging modules.

osa1 avatar Sep 26 '24 12:09 osa1

It looks like the code above your changes to module-utils.cpp was added in f44912bf23 (#6372), which includes tests. See specifically the test code in test/lit/merge/sourcemap.wat* for merging.

I don't see specific tests for inlining + source maps, but one could be added alongside e.g. test/lit/debug/source-map-smearing.wast, by adding --inlining. Though perhaps the test can be even simpler, without writing a binary in the middle? Just adding ;;@ annotations to an inlining test like test/lit/passes/inlining_all-features.wast should check that we move the annotations around properly.

kripken avatar Sep 26 '24 17:09 kripken

Thanks @kripken.

This should be ready for review now.

I've added a separate test for inlining instead of modifying one of the existing tests, as there weren't any existing test that checks specifically source maps AFAICS.

The renumbering in lines and columns in existing tests is to make sure each source map annotation has a unique line and column number, to make sure we don't accidentally use numbers of a wrong annotation.

The TODO in the code is copy/paste from a few lines above. When writing strings to the .map files we don't escape characters according to the JSON spec. It's an existing issue.

osa1 avatar Sep 27 '24 09:09 osa1

@kripken I've addressed the comments but I'm not sure how to update the C API with an optional uint32_t parameter, see my question above.

osa1 avatar Oct 01 '24 08:10 osa1