opam-doc icon indicating copy to clipboard operation
opam-doc copied to clipboard

High level view, why bin-doc, and a proposal

Open constfun opened this issue 11 years ago • 2 comments

This turned into a wall of text, sorry.

I've been reading the code and would appreciate help with my mental model.

The way I see it:

  • opam-doc proxies calls to ocamlc(.opt), runing bin-doc against the resulting .cmt file.
  • bin-doc reads the .cmt file, extracts the doc info, and stores it in a new (?) cmd format.
  • opam-doc-index then reads the cmd files and builds html from that using Cow.

That raises questions in my mind:

  • Why not just grab the compile target at the ocamlc proxy point and pass it on to ocamldoc, utilizing existing tools/format?
  • Would ocamldoc fail with such "arbitrary" input?
  • Is this why we need to compile each package in the first place? To get at the cmt files, which contain type information after all the includes an opens are done?
  • Is feeding cmt files to ocamldoc possible?

Docs and opam in general

  • How does this project relate to build-doc sections in 'opam' file spec?

Proposal

Provide a custom generator for ocamldoc that spits out json (perfect for web), and give package authors the option to specify build-doc steps that must output a single json file in a known format (ex. { "package_name" : { "modules" : [...] }.)

Benefts:

  • Alleviates the need for compiling the world just to get the docs.
  • Gives package owners control over doc output for opam.
  • Give us structured docs for the entire opam repo (eventually.)
  • Makes building unique doc sites much easier, since the json format is universally understood and accessible.
  • Is more robust, as there is no garbage collected from tests/examples/etc.
  • Is simpler, the code for a custom generator is trivial and more approachable, compared to the lexer, parser, custom format requirements of bin-doc.

Cost:

  • If opam-doc-index is to be used it will require a significant rewrite to consume json instead of cmd.
  • Working code would be dropped, namely, all of bin-doc.
    Personally, I don't grow attached to code, and consider this a benefit of a simpler solution, others may disagree.

Anecdotally

To build the doc site that I want, I would have to build a cmd to json serializer of some sort. I would much prefer to build an ocamldoc-json custom generator, because that can be used outside of opam. In fact, I've started, and there is also this.

Any and all comments are very much appreciated. If I'm not making sense, tell me. =)

constfun avatar Jan 16 '14 18:01 constfun

An update for posterity, based on conversation in #ocaml.

(Note: Some of this is fit for a wiki, but https://github.com/ocamllabs/opam-doc/wiki is inaccessible for some reason.)

The most important point that was made is this: bin-doc is meant to replace ocamldoc upstream.

ocamldoc has the following, difficult to resolve, issues:

  • It does not support -packing, meaning it cannot combine multiple packages into a single doc page.
  • It cannot handle module includes, meaning that include Module confuses ocamldoc.

The last (both?) of these points are a result of ocamldoc trying to make its own sense out of the input source files, ignoring the work that the compiler does.

bin-doc, on the other hand, takes advantage of the type information that is output during the compile step (with -bin-anot enabled.)

With this in mind:

  • opam-index-doc is to bin-doc, what a custom generator is to ocamldoc.
  • The cmd format is a good thing.
  • Hence, something like bin-doc-to-json is a valid way forward, as is opam-index-doc. These would be analogous to -html, -latex, etc. generators in ocamldoc.

This addresses most of the questions that I raised before.

The only remaining question is the relationship between opam-doc and the build-doc section in an opam file.

What if we give package authors the option to specify build-doc steps that must output cmd files?

  • The resulting cmd files can then be used by opam to build doc pages (similarly to ocamldoc allowing for different output formats.)
  • As before, this gives developers finer control over what goes into the docs.
  • As before, it alleviates some of the problem of having to create a separate compiler switch and rebuilding the world.

That is unless I'm still misunderstanding something. =)

constfun avatar Jan 17 '14 00:01 constfun

bin-doc reads the .cmt file, extracts the doc info, and stores it in a new (?) cmd format.

It actually reads the .ml(i) file to extract the doc info. Currently the doc info is not in the .cmt(i) file. opam-doc then combines the .cmt(i) file with the .cmd(i) file to produce the documentation.

ocamldoc has the following, difficult to resolve, issues:

It does not support -packing, meaning it cannot combine multiple packages into a single doc page. It cannot handle module includes, meaning that include Module confuses ocamldoc

Another important issue is with producing fully cross-referenced documentation across all the packages in OPAM. Since ocamldoc just uses strings to handle cross-references it cannot differentiate between the Foo module in one package from the Foo module in another package. opam-doc on the other hand knows which module was actually linked to during compilation and can produce a correct reference.

The last (both?) of these points are a result of ocamldoc trying to make its own sense out of the input source files, ignoring the work that the compiler does.

Yes that's basically it. The compiler keeps a lot more information than it did when ocamldoc was originally written, most of ocamldoc's front-end is now obsolete.

Hence, something like bin-doc-to-json is a valid way forward, as is opam-index-doc. These would be analogous to -html, -latex, etc. generators in ocamldoc.

Yes, but I wouldn't rush to make something just yet. This is still very much a prototype and the final version will probably be a bit different. For a start, I'm currently thinking about putting the documentation info in the .cmt(i) files by default, and only using a separate .cmd file for alternative documentation (i.e. translations) and things which have no corresponding .cmt file (like tutorials -- another intended use case for this work).

It is also worth noting that when the work is upstreamed there will no longer be any need for a special compiler switch.

lpw25 avatar Jan 17 '14 01:01 lpw25