js_of_ocaml icon indicating copy to clipboard operation
js_of_ocaml copied to clipboard

[BUG] js_of_ocaml is excessively memory hungry

Open JasonGross opened this issue 1 year ago • 3 comments

Describe the bug

js_of_ocaml is very cool! I use it on CI to generate a webpage. However, I cannot use it on the new GitHub Actions arm64 MacOS boxes, which have only 7 GB of RAM, because it sometimes eats 8--9 GB RAM to generate a single .js file. For example here is a table of build times and memory usages on linux:

     Time |   Peak Mem | File Name                                         
---------------------------------------------------------------------------
15m11.31s | 8904232 ko | Total Time / Peak Mem                             
---------------------------------------------------------------------------
 4m46.85s | 8808932 ko | ExtractionJsOfOCaml/bedrock2_fiat_crypto.js       
 4m37.20s | 7151532 ko | ExtractionJsOfOCaml/fiat_crypto.js                
 4m30.42s | 8904232 ko | ExtractionJsOfOCaml/with_bedrock2_fiat_crypto.js  
 0m25.80s | 3195952 ko | ExtractionJsOfOCaml/with_bedrock2_fiat_crypto.byte
 0m25.32s | 3193600 ko | ExtractionJsOfOCaml/bedrock2_fiat_crypto.byte     
 0m24.60s | 2814364 ko | ExtractionJsOfOCaml/fiat_crypto.byte              
 0m00.38s |  102780 ko | ExtractionJsOfOCaml/bedrock2_fiat_crypto.cmi      
 0m00.37s |   98468 ko | ExtractionJsOfOCaml/fiat_crypto.cmi               
 0m00.37s |  102360 ko | ExtractionJsOfOCaml/with_bedrock2_fiat_crypto.cmi 

It is similar on mac, and a bit better on debian sid.

I invoke it with --source-map --no-inline --enable=effects and invoke the compiler with -package js_of_ocaml -package unix -w -20 -g

For the near future (until artifacts expire), the build artifacts page contains generated .js files (fiat-html-js-of-ocaml), .ml source files (ExtractionJsOfOCaml-source-master), and compiled files (ExtractionJsOfOCaml-master-ocaml-4.11.1).

Expected behavior I expect there to be a way to make the js_of_ocaml pipeline fit in under 7GB of RAM, possibly with a flag, if necessary.

Versions js_of_ocaml 5.7.2, ocaml 4.11.1

JasonGross avatar May 09 '24 05:05 JasonGross

  • --no-inline is no longer necessary (since js_of_ocaml.5.7.0).
  • dealing with debug info seems to be slow in your use case. Removing --source-map should speed up your build. I'll try to investigate.
  • adding --disable globaldeadcode should give you a good speedup as well.

With the change mentioned, I the following for ExtractionJsOfOCaml/bedrock2_fiat_crypto.js

112.97user 1.26system 1:54.24elapsed 99%CPU (0avgtext+0avgdata 4581104maxresident)k
0inputs+21904outputs (0major+1275732minor)pagefaults 0swaps

hhugo avatar May 13 '24 14:05 hhugo

with #1614, one no longer need to disable globaldeadcode, at the cost of extra memory usage.

78.16user 1.85system 1:20.13elapsed 99%CPU (0avgtext+0avgdata 6034576maxresident)k
0inputs+19696outputs (0major+1631738minor)pagefaults 0swaps

I'll try to improve #1614

hhugo avatar May 13 '24 14:05 hhugo

I've updated #1614, I now get

73.86user 1.00system 1:14.87elapsed 99%CPU (0avgtext+0avgdata 4581396maxresident)k
0inputs+19696outputs (0major+1209270minor)pagefaults 0swaps

I still need to investigate the sourcemap issue.

Can you test #1614 and confirm it solves part of your issue ?

hhugo avatar May 13 '24 14:05 hhugo

@hhugo Regarding this, I have been working on the sourcemap slowness and will open a PR today.

OlivierNicole avatar May 23 '24 08:05 OlivierNicole

@JasonGross, any luck with #1614 ?

hhugo avatar Jun 01 '24 21:06 hhugo

I have not had a chance to try it, but if it works on your end, I don't see why it would be any different on GitHub Actions. (The files I linked to are the ones I actually use, not simplified examples .). But I can set up GHA to use the PR. Should I just clone the repo and opam pin add . on that branch?

JasonGross avatar Jun 01 '24 22:06 JasonGross

#1617 may help too if the memory consumption happens to be in sourcemaps. I find that it nearly halves the peak memory usage when linking JSOO using itself.

OlivierNicole avatar Jun 02 '24 14:06 OlivierNicole

@JasonGross, I think you can just do

opam pin add js_of_ocaml-compiler https://github.com/ocsigen/js_of_ocaml.git#speedup

hhugo avatar Jun 10 '24 15:06 hhugo

I've set it up on CI: pre (specifically here) post speedup (mit-plv/fiat-crypto#1922) (#1614) (still in progress) post optim_sourcemap_link (mit-plv/fiat-crypto#1923) (#1617) (still in progress)

JasonGross avatar Jun 12 '24 03:06 JasonGross

It looks both #1614 and #1617 make both the run time and the peak memory usage worse. I haven’t worked on #1614 but that surprises me a lot in the case #1617. Are these tests runnable on Linux? I may try to inspect them locally.

OlivierNicole avatar Jun 17 '24 12:06 OlivierNicole

Not all CI jobs have been updated.

Here is what I see for #1614

1m28.92s | 4461616 ko | ExtractionJsOfOCaml/with_bedrock2_fiat_crypto.js

hhugo avatar Jun 17 '24 13:06 hhugo

And for #1617

4m47.42s | 5253024 ko | ExtractionJsOfOCaml/with_bedrock2_fiat_crypto.js

hhugo avatar Jun 17 '24 13:06 hhugo

compared to

6m08.88s | 7720996 ko | ExtractionJsOfOCaml/with_bedrock2_fiat_crypto.js

hhugo avatar Jun 17 '24 13:06 hhugo

@OlivierNicole, I would expect your PR to only affect separate compilation during the link step but I don't think separate compilation is involved here. What part of your PR would improve the situation during whole program compilation ?

hhugo avatar Jun 17 '24 13:06 hhugo

Not all CI jobs have been updated.

Here is what I see for #1614

1m28.92s | 4461616 ko | ExtractionJsOfOCaml/with_bedrock2_fiat_crypto.js

I didn’t quite follow which of the many jobs to inspect to find the info, but I trust that your numbers are right.

@OlivierNicole, I would expect your PR to only affect separate compilation during the link step but I don't think separate compilation is involved here. What part of your PR would improve the situation during whole program compilation ?

I’m honestly not sure. Looking into it.

OlivierNicole avatar Jun 17 '24 14:06 OlivierNicole

I switched to `Stringlit  and Yojson.Raw (rather than `String and Yojson.Basic) because it saves a non-negligible amount of time on the parsing and the writing of the mappings fields of source maps (essentially, Yojson.Basic.to_string (`String s) checks for special characters or Unicode code points in the string, which takes a suprising amount of time and is unnecessary on mappings since they contain only base64 numbers, commas and semicolons.

I’m not sure it explains it all though. Trying to profile locally.

OlivierNicole avatar Jun 17 '24 15:06 OlivierNicole

Are these tests runnable on Linux?

Yes. The cheapest way to run them is to download any of the artists labeled ExtractionJsOfOCaml-source* from our CI. These artifacts contain a handful of self-contained .ml files, the ones that we want to turn into .js files. I gave the flags I use in the initial post.

The expensive way to run the tests is to clone the repo, do opam install coq, and then run something like make js-of-ocaml

JasonGross avatar Jun 17 '24 23:06 JasonGross

I can’t reproduce a significant difference in terms of run time nor profile between master and #1617. The time spent on source maps is negligible compared to the time spent optimizing. I’m starting to suspect that the CI run times have a huge variance.

OlivierNicole avatar Jun 18 '24 09:06 OlivierNicole

P.S. I’ve done the test on with_bedrock2_fiat_crypto. Peak memory usage is not significantly affected, either.

OlivierNicole avatar Jun 18 '24 14:06 OlivierNicole

#1614 has been merged. Reopen if you still have issues

hhugo avatar Jun 28 '24 09:06 hhugo