embucket-labs icon indicating copy to clipboard operation
embucket-labs copied to clipboard

[DISCUSSION] Documentation generation for builtins (Functions, Aggregates, Commands, ...)

Open Vedin opened this issue 6 months ago • 6 comments

It would be absolutely unnecessary to write documentation for it by hand. So, we need a common way to structure our code/comments to have autogenerated documentation for functions. I see the following options:

  1. cargo doc. It parses every /// or //! comment (plus Markdown files in src/) and spits out a fully-searchable HTML site. It also has JSON format output if we want to reuse it somewhere.
  2. cargo-readme Turn your crate-level docs (//!) into a README so crates.io/GitHub always match. Useful to have an updated README for every crate, but I don't think useful for our purposes.
  3. mdBook book from md files. Looks like overkill for this purpose because we'd need to create md file for each function.
  4. Automatic OpenAPI / Swagger docs, which are mostly used for a website. It's more suitable for API documentation than for our functions code.
  5. Reuse datafusion user_doc macro, which is common in datafusion/functions and helps generate them in a more structured way.

I personally prefer option 1 or 5. We currently keep both in our codebase. Also, we have just simple comments in some functions. We need to decide which one is the standard and refactor everything in this way. Especially, it's important if we want to propose a standard template for them.

Vedin avatar May 28 '25 13:05 Vedin

@ravlio @osipovartem @rampage644 Any ideas on this?

Vedin avatar May 28 '25 13:05 Vedin

Do you have an idea what is datafusion story with user_doc, tried to quickly figure out and couldn't. If this is not something to be deprecated, I would just go forward with datafusion way.

rampage644 avatar May 28 '25 22:05 rampage644

What is our goal? If static docs then, does user_doc somehow generate it? If i'm not mistaken, user_doc is needed to make autocomplete possible?

ravlio avatar May 29 '25 11:05 ravlio

https://docs.rs/datafusion-functions/47.0.0/datafusion_functions/datetime/current_date/index.html I don't see any user_doc macro expansions in generated DF docs.

ravlio avatar May 29 '25 11:05 ravlio

https://docs.rs/datafusion-functions/47.0.0/datafusion_functions/datetime/current_date/index.html I don't see any user_doc macro expansions in generated DF docs.

It's exposed not in rust doc but in their own documentation https://datafusion.apache.org/user-guide/sql/scalar_functions.html#current-date. Which they are created with Sphinx https://www.sphinx-doc.org/en/master/

Vedin avatar May 29 '25 11:05 Vedin

@Vedin Actually this PR regarding documenting a database functions, but my comment is about documenting a rust code itself.

cargo doc is the standard—similar to how Javadoc or other inline documentation systems were used back in the day—so for me, it’s a solid and non-optional choice as for me.

Here’s how I envision the documentation process:

  • Every function should be documented be it private or public.
  • Each module or crate file should have a top-level comment in cargo doc format.
  • Each crate’s README.md include documentation extracted from the top-level comments of its module files, combining them to fill the relevant sections of the README. It doesn't include docstrings for individual functions, but the top-level docs can reference them if needed.
  • Documentation can be initially generated by AI agents. It may not be perfect, but the quality can be quite decent.
  • Since AI might generate content that needs to be fixed, the generated docs should be submitted as pull requests. These can then be reviewed, commented on, and rejected if needed. In the expectation is that either another commit will be added to address comments, or a follow-up PR will be created by the AI.
  • Developers can also contribute fixes directly. In some cases, the more you try to get AI to fix something, the worse it becomes—so a manual correction can be the simplest path. Maybe this is the only way to polish docs after AI.

YaroslavLitvinov avatar Jun 03 '25 16:06 YaroslavLitvinov