js_of_ocaml icon indicating copy to clipboard operation
js_of_ocaml copied to clipboard

Optimize sourcemap processing

Open OlivierNicole opened this issue 1 year ago • 20 comments

This improves the efficiency of sourcemap processing, chiefly during linking, but also in serializing and deserializing sourcemaps.

Currently, when linking with source maps enabled, js_of_ocaml will decode the source maps for each input file completely into an in-memory explicit mapping, then merge them, then re-encode the merged sourcemap. The merging algorithm involves superlinear sorting operations. However, the format of sourcemaps allows to avoid sorting for the concatenation of generated files.

In addition, all operations performed by linking (concatenation of files, removal and addition of lines) can be performed directly on the encoded form, which further avoids a full parsing to an in-memory representation.

When compiling js_of_ocaml with itself, the final linking step is accelerated by a speedup of about 2.3x with OCaml 5.2.0+fp:

recent trunk (bfbeb692c) this PR
331.8 ms ± 6.1 ms 143.3 ms ± 1.1 ms

fix https://github.com/ocsigen/js_of_ocaml/issues/1446

OlivierNicole avatar May 23 '24 13:05 OlivierNicole

I can’t seem to reproduce the CI failure:

File "toplevel/examples/lwt_toplevel/dune", line 2, characters 8-16:
2 |  (names toplevel)
            ^^^^^^^^
(cd _build/default/toplevel/examples/lwt_toplevel && ../../../compiler/bin-js_of_ocaml/js_of_ocaml.exe link --source-map-inline -o toplevel.bc.js .toplevel.eobjs/jsoo/toplevel.bc.runtime.js ../../../.js/!effects+toplevel/stdlib/stdlib.cma.js ../../../.js/!effects+toplevel/compiler-libs.common/ocamlcommon.cma.js ../../../.js/!effects+toplevel/compiler-libs.bytecomp/ocamlbytecomp.cma.js ../../../.js/!effects+toplevel/menhirLib/menhirLib.cma.js ../../../.js/!effects+toplevel/gen/gen.cma.js ../../../.js/!effects+toplevel/sedlex/sedlex.cma.js ../../../.js/!effects+toplevel/yojson/yojson.cma.js ../../../compiler/lib/.js_of_ocaml_compiler.objs/jsoo/!effects+toplevel/js_of_ocaml_compiler.cma.js ../../../lib/runtime/.jsoo_runtime.objs/jsoo/!effects+toplevel/jsoo_runtime.cma.js ../../../lib/js_of_ocaml/.js_of_ocaml.objs/jsoo/!effects+toplevel/js_of_ocaml.cma.js ../../../.js/!effects+toplevel/uutf/uutf.cma.js ../../../.js/!effects+toplevel/re/re.cma.js ../../../.js/!effects+toplevel/tyxml.functor/tyxml_f.cma.js ../../../.js/!effects+toplevel/tyxml/tyxml.cma.js ../../../.js/!effects+toplevel/react/react.cma.js ../../../.js/!effects+toplevel/reactiveData/reactiveData.cma.js ../../../lib/tyxml/.js_of_ocaml_tyxml.objs/jsoo/!effects+toplevel/js_of_ocaml_tyxml.cma.js ../../../compiler/lib-dynlink/.js_of_ocaml_compiler_dynlink.objs/jsoo/!effects+toplevel/js_of_ocaml_compiler_dynlink.cma.js ../../../.js/!effects+toplevel/compiler-libs.toplevel/ocamltoplevel.cma.js ../../lib/.js_of_ocaml_toplevel.objs/jsoo/!effects+toplevel/js_of_ocaml_toplevel.cma.js ../../../.js/!effects+toplevel/lwt/lwt.cma.js ../../../lib/lwt/.js_of_ocaml_lwt.objs/jsoo/!effects+toplevel/js_of_ocaml_lwt.cma.js ../../../.js/!effects+toplevel/graphics/graphics.cma.js ../../../lib/deriving_json/.js_of_ocaml_deriving.objs/jsoo/!effects+toplevel/js_of_ocaml_deriving.cma.js ../../../.js/!effects+toplevel/str/str.cma.js ../../../.js/!effects+toplevel/dynlink/dynlink.cma.js ../../../lib/lwt/graphics/.graphics_js.objs/jsoo/!effects+toplevel/graphics_js.cma.js ../../../.js/!effects+toplevel/ocaml-compiler-libs.common/ocaml_common.cma.js ../../../.js/!effects+toplevel/ppxlib.astlib/astlib.cma.js ../../../.js/!effects+toplevel/stdlib-shims/stdlib_shims.cma.js ../../../.js/!effects+toplevel/ppxlib.ast/ppxlib_ast.cma.js ../../../.js/!effects+toplevel/ocaml-compiler-libs.shadow/ocaml_shadow.cma.js ../../../.js/!effects+toplevel/ppxlib.print_diff/ppxlib_print_diff.cma.js ../../../.js/!effects+toplevel/ppx_derivers/ppx_derivers.cma.js ../../../.js/!effects+toplevel/ppxlib.traverse_builtins/ppxlib_traverse_builtins.cma.js ../../../.js/!effects+toplevel/sexplib0/sexplib0.cma.js ../../../.js/!effects+toplevel/ppxlib.stdppx/stdppx.cma.js ../../../.js/!effects+toplevel/ppxlib/ppxlib.cma.js ../../../ppx/ppx_js/as-lib/.ppx_js.objs/jsoo/!effects+toplevel/ppx_js.cma.js ../../../ppx/ppx_js/lib/.ppx_js_rewriter.objs/jsoo/!effects+toplevel/ppx_js_rewriter.cma.js .toplevel.eobjs/jsoo/dune__exe.cmo.js .toplevel.eobjs/jsoo/dune__exe__B64.cmo.js .toplevel.eobjs/jsoo/dune__exe__Colorize.cmo.js .toplevel.eobjs/jsoo/dune__exe__Graphics_support.cmo.js .toplevel.eobjs/jsoo/dune__exe__Ocp_indent.cmo.js .toplevel.eobjs/jsoo/dune__exe__Indent.cmo.js .toplevel.eobjs/jsoo/dune__exe__Ppx_support.cmo.js .toplevel.eobjs/jsoo/dune__exe__Toplevel.cmo.js ../../../.js/!effects+toplevel/stdlib/std_exit.cmo.js --linkall)
../../../compiler/bin-js_of_ocaml/js_of_ocaml.exe: You found a bug. Please report it at https://github.com/ocsigen/js_of_ocaml/issues :
Error: File "compiler/lib/source_map_io.yojson.ml", line 109, characters 12-18: Assertion failed

This test builds fine on my machine.

Edit: never mind, it was an off-by-one error in an assertion.

OlivierNicole avatar May 23 '24 14:05 OlivierNicole

One test is failing because reading “index maps” (composite source maps) is not supported for now—something that is not required for js_of_ocaml to function. I will fix this test soon.

OlivierNicole avatar May 23 '24 15:05 OlivierNicole

I just tested this change on an incremental build for one of our largest executables and am getting this timing for js_of_ocaml link.

Pre: 2.469s Post: 1.301s

Very cool!

rickyvetter avatar May 24 '24 15:05 rickyvetter

This is ready for review. I don’t quite understand the build problems on the 5.1 CI, but they don’t seem related to this PR.

OlivierNicole avatar Jun 04 '24 14:06 OlivierNicole

My bad, it actually was due to a missing :standard in a flags field of a Dune stanza. The CI passes now.

OlivierNicole avatar Jun 11 '24 11:06 OlivierNicole

Is anything blocking this PR still?

OlivierNicole avatar Jul 16 '24 10:07 OlivierNicole

My understanding is that the Line_edits change is what gives use the most part of the improvement. Could we achieve similar perf using sections (Index sourcemap) ?

I hoped so initially, but it turns out that Link_js may remove or add metadata comments in the middle of a compilation unit, which pretty much killed my hope of simply using index maps.

OlivierNicole avatar Jul 30 '24 15:07 OlivierNicole

I’m trying to answer some of your comments but my answers are labelled as “pending”, I’m not sure why.

OlivierNicole avatar Jul 30 '24 16:07 OlivierNicole

Never mind, apparently I had started a review at some point in the past and never completed it, and apparently Github considers all your comments henceforth as pending

OlivierNicole avatar Jul 31 '24 09:07 OlivierNicole

Just to make sure we are on the same page, my intention is to wait for #1640 to be merged, rebase on it, and then address the rest of your new review (thanks!).

OlivierNicole avatar Aug 02 '24 15:08 OlivierNicole

Just to make sure we are on the same page, my intention is to wait for #1640 to be merged, rebase on it, and then address the rest of your new review (thanks!).

What do you think about the following, on top of #1640.

  • drop the indexed sourcemap for now (it seems it's not used)
  • first merge the delayed/on demand decoding that replace the sourcemap change from #1654
  • keep this PR for "edit un-decoded mapping" only.

hhugo avatar Aug 02 '24 20:08 hhugo

drop the indexed sourcemap for now (it seems it's not used)

It is used, to represent the sourcemap of the linker output (in Link_js).

OlivierNicole avatar Aug 02 '24 21:08 OlivierNicole

#1640 has been merged, I'll let you rebase this one

hhugo avatar Sep 04 '24 08:09 hhugo

Done

OlivierNicole avatar Sep 04 '24 13:09 OlivierNicole

I'm not very happy with the callback based approach used on Link_js. Without thinking much about it, I would assume that one could reconstruct the edit instruction inside let copy ic oc = ... only.

Maybe we could move the handling/parsing of Sourcemap index into a separate PR to remove some noise in here.

hhugo avatar Sep 05 '24 10:09 hhugo

Regarding the representations of edits. We currently have per line instructions. Have you considered using a different approach (closer to what's done on master with reloc) using "copy n lines starting at pos n into position m" ?

hhugo avatar Sep 05 '24 10:09 hhugo

I considered it, but at the time discarded it as early optimisation. Now that the main source of slowdown as been eliminated, it might be worth trying.

OlivierNicole avatar Sep 05 '24 12:09 OlivierNicole

Regarding the representations of edits. We currently have per line instructions. Have you considered using a different approach (closer to what's done on master with reloc) using "copy n lines starting at pos n into position m" ?

Actually, what I had considered was to have the edits represented as a list of “copy / delete n lines” rather than a long list of “copy / delete 1 line”.

What you suggest would make Source_map.Mappings.edit no longer linear. In your suggestion, some sorting is involved, whereas in what is currently done, the destination line is implicitly represented by the position of the edit in the list of edits.

OlivierNicole avatar Oct 10 '24 15:10 OlivierNicole

My understanding is that the Line_edits change is what gives use the most part of the improvement. Could we achieve similar perf using sections (Index sourcemap) ?

I hoped so initially, but it turns out that Link_js may remove or add metadata comments in the middle of a compilation unit, which pretty much killed my hope of simply using index maps.

Can we maybe revisit this approach. Maybe we could teach link_js to copy blocks of lines that do not mess up with sourcemap ? If we really want to drop comments, we could emit an empty line instead. We would probably need to emit Index map when compiling cma files so that we have one sourcemap per unit and we can easily drop dead units when linking.

hhugo avatar Oct 15 '24 14:10 hhugo

I've opened #1715 instead. It removes the need to change encoded mappings.

hhugo avatar Oct 17 '24 12:10 hhugo

Closing in favor of #1715

hhugo avatar Oct 22 '24 08:10 hhugo