big performance regression between 2.2.2 and 2.4.1
We are using odig to create the doc for our whole repo in one go. The process used to take about 8 minutes when we were using odoc 2.2.2. It takes north of 30 minutes since we moved to odoc 2.4.1. I can't share a wide scale reproduction at this stage.
is it a known problem?
odig log -s --order-by dur could give hints.
Thanks! The top results (taking more than 1m) when running this command with odoc 2.4.1 are
[72202:spawn 29min35s billing e:a020a11be2815ba8] ['/home/me/Code/ahrefs/monorepo/_opam/
bin/odoc' 'html-targets' '-I' 'some_lib' ... '-I' '/home/me/Code/ahrefs/monorepo/_opam/var/c
ache/odig/odoc/ocaml/' '-o' '/ho
me/me/Code/ahrefs/monorepo/_opam/var/cache/odig/html' '/home/me/Code/ahrefs/monorepo/_op
am/var/cache/odig/odoc/billing/ahrefs/ahrefs.odoc' > '/home/me/Code/ahrefs/monorepo/_opa
m/var/cache/odig/odoc/billing/ahrefs/ahrefs.html.writes'][0]
[105394:spawn 29min16s billing e:d304f41d619fd378] ['/home/me/Code/ahrefs/monorepo/_opam
/bin/odoc' 'html' '--theme-uri' '_odoc-theme' '-o' '/home/me/Code/ahrefs/monorepo/_opam/
var/cache/odig/html' '/home/me/Code/ahrefs/monorepo/_opam/var/cache/odig/odoc/billing/ah
refs/ahrefs.odoc' '-I' 'some_lib' ... '-I' '/home/me/Code/ahrefs/monorepo/_opam/var/c
ache/odig/odoc/ocaml/'][0]
There are about 20 -I some internal ahrefs lib that I've truncated because they just contain internal names.
Some of those libs are fairly large (in term of number of files, lines of code, but also size of the compilation artifacts). For example there's at least one cmxs of 50mb.
This billing lib is using functors pretty heavily.
I don't know what could cause that, it seems a git bisect would help...
As a first bisection, how does odoc 2.3 behave?
2.3.0 is already displaying the regression. I'll try to proceed further with the bisection.
bisection is complicated because of the external dependency on odoc parser. Some commit like 29810eb0 do not compile with odoc parser 2.0.0 or 2.3.0 or 2.3.1. Is there a convenient way to know which odoc-parser to use for a given odoc commit?
odoc$ dune build -p odoc
(cd _build/default && /home/me/Code/ahrefs/monorepo2/_opam/bin/ocamlc.opt -w -40 -g -w -18-53 -w -50 -g -bin-annot -I src/model/.odoc_model.objs/byte -I /home/me/Code/ahrefs/monorepo2/_opam/lib/astring -I /home/me/Code/ahrefs/monorepo2/_opam/lib/camlp-streams -I /home/me/Code/ahrefs/monorepo2/_opam/lib/ocaml/compiler-libs -I /home/me/Code/ahrefs/monorepo2/_opam/lib/odoc-parser -I /home/me/Code/ahrefs/monorepo2/_opam/lib/result -intf-suffix .ml -no-alias-deps -open Odoc_model__ -o src/model/.odoc_model.objs/byte/odoc_model__Semantics.cmo -c -impl src/model/semantics.ml)
File "src/model/semantics.ml", line 232, characters 26-42:
232 | | { value = `Code_block (metadata, code); location } ->
^^^^^^^^^^^^^^^^
Error: This pattern matches values of type 'a * 'b
but a pattern was expected which matches values of type
Odoc_parser.Ast.code_block
(cd _build/default && /home/me/Code/ahrefs/monorepo2/_opam/bin/ocamlopt.opt -w -40 -g -w -18-53 -w -50 -g -I src/model/.odoc_model.objs/byte -I src/model/.odoc_model.objs/native -I /home/me/Code/ahrefs/monorepo2/_opam/lib/astring -I /home/me/Code/ahrefs/monorepo2/_opam/lib/camlp-streams -I /home/me/Code/ahrefs/monorepo2/_opam/lib/ocaml/compiler-libs -I /home/me/Code/ahrefs/monorepo2/_opam/lib/odoc-parser -I /home/me/Code/ahrefs/monorepo2/_opam/lib/result -intf-suffix .ml -no-alias-deps -open Odoc_model__ -o src/model/.odoc_model.objs/native/odoc_model__Semantics.cmx -c -impl src/model/semantics.ml)
File "src/model/semantics.ml", line 232, characters 26-42:
232 | | { value = `Code_block (metadata, code); location } ->
^^^^^^^^^^^^^^^^
Error: This pattern matches values of type 'a * 'b
but a pattern was expected which matches values of type
Odoc_parser.Ast.code_block
Argh, that is very annoying yes...
No, I don't think there is a good way to know that.. However, it is likely to be one of the merged PR from odoc-parser since the last release, that is either:
- not pinned,
- https://github.com/ocaml-doc/odoc-parser/commit/370f1cf044d7b2b89de35f72bb168c5fca66864d for syntax for tables,
- https://github.com/ocaml-doc/odoc-parser/commit/92d15eaf7b82c7cec2c3b702cd6b22c26a208c1c for
@hiddentag - https://github.com/ocaml-doc/odoc-parser/commit/f12420abc6a0c0942475607a336e3430dee361be for "better code block syntax"
In the case you are hitting, you likely need to pin odoc-parser to the ocaml-doc/odoc-parser repository at commit https://github.com/ocaml-doc/odoc-parser/commit/f12420abc6a0c0942475607a336e3430dee36. I hope there won't be intermediate odoc commit which requires pinning outside of those four options...
Thanks a lot for the involved investigation!
So I haven't found which commit introduced the regression yet. But I found that master doesn't have the issue! It seems to have been fixed in 13ddc152e22820c4f1d45417311295ab2a9aee0b.
It looks like the regression came over multiple commits. But I struggle to figure out the worst one yet.
Baseline of my test with 2.2.2 is a bit more than 40s.
03005263 already slow at that commit, but not as bad as with 2.3.0. My test takes 57s. c60f45b0cad23f94ec42d94399048dc74f95b34c also show a slowdown. My test takes 50s.
There is one or multiple commits later on making things even worse.
92d15ea from odoc-parser for most of my bisecting when odoc-parser is not part of the odoc repo.
Digged a bit further but I could only narrow it down to a bunch of commits which ends with failures
odoc: internal error, uncaught exception:
File "src/xref2/tools.ml", line 1600, characters 9-15: Assertion failed
Raised at Odoc_xref2__Tools.assert_not_functor in file "src/xref2/tools.ml", line 1600, characters 9-21
Called from Odoc_xref2__Utils.ResultMonad.bind in file "src/xref2/utils.ml" (inlined), line 10, characters 38-41
Called from Odoc_xref2__Tools.resolve_type.(fun) in file "src/xref2/tools.ml", line 1044, characters 8-129
Called from Odoc_xref2__Utils.ResultMonad.bind in file "src/xref2/utils.ml" (inlined), line 10, characters 38-41
Called from Odoc_xref2__Tools.resolve_type in file "src/xref2/tools.ml", line 1040, characters 8-881
Called from Odoc_xref2__Locations_tools.lookup_loc in file "src/xref2/locations_tools.ml", line 41, characters 6-51
Called from Odoc_xref2__Link.locations in file "src/xref2/link.ml", line 323, characters 12-53
Called from Odoc_xref2__Link.type_decl in file "src/xref2/link.ml", line 832, characters 13-33
Called from Odoc_xref2__Link.signature_items.(fun) in file "src/xref2/link.ml", line 461, characters 41-59
Called from Stdlib__List.fold_left in file "list.ml", line 121, characters 24-34
Called from Odoc_xref2__Link.signature_items in file "src/xref2/link.ml", line 453, characters 4-1023
Called from Odoc_xref2__Link.signature in file "src/xref2/link.ml", line 444, characters 14-44
Called from Odoc_xref2__Link.module_type_expr in file "src/xref2/link.ml", line 766, characters 29-49
Called from Odoc_xref2__Link.module_decl in file "src/xref2/link.ml", line 526, characters 34-64
Called from Odoc_xref2__Link.module_ in file "src/xref2/link.ml", line 501, characters 16-45
Called from Odoc_xref2__Link.signature_items.(fun) in file "src/xref2/link.ml", line 457, characters 45-58
Called from Stdlib__List.fold_left in file "list.ml", line 121, characters 24-34
Called from Odoc_xref2__Link.signature_items in file "src/xref2/link.ml", line 453, characters 4-1023
Called from Odoc_xref2__Link.signature in file "src/xref2/link.ml", line 444, characters 14-44
Called from Odoc_xref2__Link.unit in file "src/xref2/link.ml", line 333, characters 17-58
Called from Odoc_xref2__Lookup_failures.with_ref in file "src/xref2/lookup_failures.ml", line 13, characters 10-14
Called from Odoc_xref2__Lookup_failures.catch_failures in file "src/xref2/lookup_failures.ml", line 60, characters 20-37
Called from Odoc_odoc__Odoc_link.from_odoc.(fun) in file "src/odoc/odoc_link.ml", line 42, characters 6-37
Called from Odoc_odoc__Rendering.document_of_input in file "src/odoc/rendering.ml", line 14, characters 2-62
Called from Odoc_odoc__Rendering.targets_odoc in file "src/odoc/rendering.ml", line 48, characters 6-68
Called from Cmdliner_term.app.(fun) in file "cmdliner_term.ml", line 24, characters 19-24
Called from Cmdliner_term.app.(fun) in file "cmdliner_term.ml", line 22, characters 12-19
Called from Cmdliner_eval.run_parser in file "cmdliner_eval.ml", line 35, characters 37-44
5146b2319a30d7fb83e8039f8d2c8d04508aebfe 016f5cb01d191f5953aa80d5222856e406adce99 1357d484223dc8a3d357849a7e97a09a7d76b32a 0ef63b4571fdbfff57cb2d2535127d841402f42e 1d79ac26b58b52f2e43f1e5afb466bee84c3f83c 79f5108029f6dfcfb9cb892544684901f4769f1a
Sorry I can't put much more time into investigating this issue.
Thanks a lot for the time put into this investigation!
It seems that the "render source code" feature, even when not activated (drivers don't know yet how to drive it) is the culprit.
I will see if there is a fix simpler than https://github.com/ocaml/odoc/commit/13ddc152e22820c4f1d45417311295ab2a9aee0b that we can backport to the 2.3 and 2.4 branches!
Do you have many modules without interfaces? It seems possible that the cause is that we're doing extra processing of implementations for source-rendering purposes, which would only show up when processing cmt rather than cmti files.
yes we have a good number of such modules. Generally the repo is fairly large. Tens of thousands of files, millions of lines of code being compiled. In some projects there are basically no mli. The package on which my test takes 40s with odoc 2.2.2 has maybe 300 modules, half those without a corresponding mli.
Ah based on https://github.com/ocaml/odoc/issues/1141#issuecomment-2154552538 I think this is fixed in odoc 3, so I'll close this.