current-bench WIP ocaml compiler benchmarks

A big issue to keep track of the ocaml compiler benchmarks:

The latest work is on this branch https://github.com/art-w/ocaml/tree/cb-comanche and is structured as follows:

bench.Dockerfile to download the external repositories.
make bench to run the repositories compilation then the testsuite make bench to run the light tests
bench.sh to compile the repositories and record metrics with -dtimings

All the benchmarks output in a temporary file in testsuite/tests/benchmarks/ as a tab separated metric_name\tvalue, where the same metric_name can be repeated if run multiple times. The testsuite then converts this to the expected json with to_json.awk (called here by the testsuite make bench)

There are a bunch of things that should be improved:

[x] ~~Disable comanche in production for this repo, it's too unstable! Also reduce the NB_RUN to 2 or 3, and remove all the -j 128 in that file as it's too much for autumn which already produces tight metrics.~~
[x] Output the json metrics as soon as we record them rather than at the final step, to get faster feedback.
[x] Produce graph overlays with https://github.com/ocurrent/current-bench/pull/273 : Currently we only keep the total of -dtimings, but it actually produces timings for all the internal steps (parsing, typing, compilation, etc) that would help understand where regressions happen.
[x] Similarly, we only keep track of the total binaries size but we should categorize by file extension.
[x] Add units to the graphs (seconds or octets)
[ ] The bench.Dockerfile should use a fixed version of the repositories, not "the latest" as otherwise we don't know if regressions happened because of the PR or an update to the external repo.
[ ] Rebase to trunk... It's problematic because it is often not compatible with popular libraries.
[x] We'll need https://github.com/ocurrent/current-bench/issues/247 to report breaking changes (like above)
[ ] Add more interesting packages, there is some failed attempts in the shell script.
[ ] Add metrics for the bytecode compiler (see request)
[ ] Cleanup all the mess I made while trying to set up the custom switch, repositories and replaying git history. Most of the work happened in new files (except the new bench targets in the Makefiles), so it shouldn't be too hard to create a "clean commit" once we are done with the experiments.

If you start working on one of those things, please add a comment! Also let me know if I can add anything to clarify how it currently works or one of the tasks. The easiest way to test your modifications is by submitting PRs to https://github.com/art-w/ocaml and checking the results at https://autumn.ocamllabs.io/art-w/ocaml

Feb 14 '22 13:02 art-w

I'm disabling the comanche stuff right now since it's blocking the other issues.

Feb 15 '22 15:02 art-w

Add units to the graphs (seconds or octets)

I'll work on adding these units to the graphs. I'll also switch to using the v2 benchmark JSON schema, while working on this.

EDIT: Opened https://github.com/art-w/ocaml/pull/6

Feb 17 '22 15:02 punchagan

Rebase to trunk... It's problematic because it is often not compatible with popular libraries. We'll need https://github.com/ocurrent/current-bench/issues/247 to report breaking changes.

Opened #320 to address this.

Feb 22 '22 04:02 punchagan

Produce graph overlays with #273: Currently we only keep the total of -dtimings, but it actually produces timings for all the internal steps (parsing, typing, compilation, etc) that would help understand where regressions happen.

@art-w when you mention overlay graphs, would it be sufficient to produce graphs for the first level of the "hierarchy"? For pp.ml for instance, parsing, typing, transl and generate.

0.046s pp.ml
  0.002s parsing
    0.002s parser
  0.014s typing
  0.001s transl
  0.029s generate
    0.001s cmm
    0.011s compile_phrases
      0.001s selection
      0.001s cse
      0.001s liveness
      0.002s spill
      0.004s regalloc
      0.001s emit
      0.001s other
    0.012s assemble
    0.004s other
0.036s dyn.ml
  0.001s parsing
    0.001s parser
  0.013s typing
  0.001s transl
  0.021s generate
    0.006s compile_phrases
      0.001s cse
      0.001s spill
      0.002s regalloc
      0.001s other
    0.010s assemble
    0.004s other
0.008s stdune__.ml
  0.002s typing
  0.006s generate
    0.006s assemble
0.003s stdune__ansi_color.mli
  0.002s typing
0.001s stdune__either.mli
0.006s stdune__list.mli
  0.004s typing
  0.002s other
0.012s stdune__loc0.ml
  0.002s typing
  0.010s generate
    0.003s compile_phrases
      0.002s emit
    0.007s assemble
    0.001s other
0.002s stdune__code_error.mli
  0.001s typing
0.012s stdune__code_error.ml
  0.003s typing
  0.008s generate
    0.001s compile_phrases
    0.007s assemble
    0.001s other
  0.001s other

Feb 22 '22 09:02 punchagan

Yes that sounds good! I don't think we'll get correct results if we sum the really small rounded numbers (also the stats printer skips the steps that take less than 0.001s) There is also some "toplevel" other that I ended up ignoring (not affiliated with a file), but they represent a significant amount of time so perhaps we should also report them?

Feb 22 '22 10:02 art-w

https://github.com/art-w/ocaml/pull/7 implements overlaid graphs https://github.com/art-w/ocaml/pull/8 outputs the JSON metrics as soon as they are recorded

Mar 01 '22 10:03 punchagan

https://github.com/art-w/ocaml/pull/8 implements overlaid graphs for binary sizes

Mar 08 '22 09:03 punchagan