agda Documentation for Agda code à la Haddock

Documentation for Agda code à la Haddock

(I couldn't find any previous issues discussing docstrings for Agda, so I created this one.)

I'd like to propose we add docstrings to Agda, so we can generate API documentation, and more thoroughly document, for instance, agda-stdlib.

1. Where can docstrings be placed? Docstrings are always placed on the line directly before, indented by any amount, and may accompany a module declaration, function definition, datatype definition or constructor.

2. What is the syntax of docstrings? Docstrings consist of a series of line comments started by -- | :

-- | The identity function.
id : {A : Set} -> A -> A
id a = a

Docstrings are written using the CommonMark markdown format. They are parsed as follows:

We take the docstring as a whole.
We remove the first four characters of each line plus a number of characters equal to the number of spaces after the -- | on the first line.
We parse the remaining text using commonmark.

Note that each successive line *must start with -- | as well, which would be a departure from Haskell. The reason is that Markdown is space sensitive, and the pipe character makes it very clear where the left margin is. Alternatively, we could relax this to Note that this means that each successive line must start with -- , i.e., two dashes and two spaces. If any characters are consumed while removing the first four characters, this should be a warning. The reason for doing this is that Markdown is whitespace sensitive.

2.1 Special syntax for code blocks I propose two extensions for including code blocks in docstrings:

any series of comment lines starting with -- > should be rendered as checked Agda code blocks; and
any series of comment lines starting with -- < should be rendered as unchecked Agda code blocks.

These extensions follow Haskell's Bird notation and lhs2TeX's extension, discriminating between code blocks and specification blocks. They are rendered by textually replacing them by fenced code blocks before running the CommonMark parser, i.e., we replace...

-- | The identity function.
-- > id 1
id : {A : Set} -> A -> A
id a = a

...by...

-- | The identity function.
--  ``` agda
--  id 1
--  ```
id : {A : Set} -> A -> A
id a = a

...and...

-- | The identity function.
-- < id 1
id : {A : Set} -> A -> A
id a = a

...by...

-- | The identity function.
--  ``` agda spec
--  id 1
--  ```
id : {A : Set} -> A -> A
id a = a

2.2 Checked code blocks I propose two different strategies for checking code blocks in docstrings, which I'll refer to as snippets and definitions.

Snippets: Snippets, such as the code block containing `id 1`, are checked as an anonymous definition in a where clause. (Perhaps, an extension for checking equivalences would be useful, e.g., if the following line is `-- = 1` it would be checked as `_ : id 1 == 1; _ = refl`?)
Definitions: Definitions are checked as top-level definitions appended at the end of the current file.

Agda distinguishes between snippets and definitions solely by checking whether or not the block matches the regular expression \b=\b, i.e., whether it contains an equals sign that isn't part of a larger token. If it doesn't, it classifies it as a snippet, otherwise it is a definition.

If we would like to extend this in the future, we can allow further configuration via code block classes. For instance, to ensure something is parsed as a snippet, we could write

-- | ```agda snippet
-- | id 1
-- | ```

If it becomes clear that additional approaches are needed, we could either refine the algorithm for determining whether something is a snippet or a definition, or devise some syntax to allow the programmer to add these classes to the special syntax for code blocks, e.g., by listing additional classses on the line immediately preceding the block.

2.3 Special syntax for references to variables I propose we allow references to ∏-bound named parameters of the type by prefixing them with an @, so for instance:

-- | The identity function.
-- |
-- | @A
-- | : The type of the argument taken by the identity function.
-- | @a
-- | : The value which will be returned unchanged.
id : {A : Set} -> (a : A) -> A
id a = a

We could process these after commonmark finishes processing the Markdown, as they'll be unchanged, and end up in the text nodes, or we could process them before if we would like them to be an alias for specific Markdown syntax, e.g., a Markdown link referring specifically to the anchor attached to the actual binding site, which would increase portability of the API docs.

Regardless, the important thing is that they're checked. The example above leverages native Markdown definition lists to typeset them neatly, so we'd not have anything to do in this regard.

Sep 01 '21 08:09 wenkokke

@andreasabel Apologies, my cat submitted the issue when the only content was the title "Documentation". I've since added the actual intended content.

Sep 01 '21 09:09 wenkokke

Once we have these docstrings clearly attached to specific definitions, we can use them to improve discovery via C-c C-z by allowing users to look for terms in docs.

Sep 11 '21 16:09 gallais

Related: #3198, #4988.

Sep 14 '21 09:09 nad

Some comments and questions:

I think it makes sense to have something like this.
You did not give any use cases for definitions.
Definitions are checked as top-level definitions appended at the end of the current file.

Perhaps "at the end of the current file" can be replaced by "at the end of the innermost module".

Is the idea that these definitions should be available for use somewhere else, in regular code and/or docstrings?
Snippets [...] are checked as an anonymous definition in a where clause.

Is the idea that it should not be possible to use earlier definitions in snippets?

What does "a where clause" mean? For instance, it does not make sense to attach a where clause to a data type definition, and there could be several clauses in a function definition.
How would docstrings be type-set by, say, the LaTeX backend? Like plain comment text?
If we add support for both docstrings and type-checked comments (#4988), how should they interact? I think it makes sense to design the two features together.
Docstrings might provide a (partial) solution to issue #3198: When there is a docstring we can link to the comment instead of the definition.

(Offtopic: It appears as if GitHub now includes the title when rendering issue links inside lists. I don't think this used to be the case. Is it possible to turn off this feature?)

Sep 14 '21 10:09 nad

You did not give any use cases for definitions.

I was thinking of allowing the user to show how to use the definition, together with other definitions in the module, in an example, such as for example in: https://github.com/tweag/linear-base/blob/master/src/Data/Unrestricted/Internal/Ur.hs#L13-L21

Perhaps "at the end of the current file" can be replaced by "at the end of the innermost module".

That is a great suggestion, and probably what we should go with. I think I was reasoning under the flawed assumption of "one file, one module".

Is the idea that these definitions should be available for use somewhere else, in regular code and/or docstrings?

I would say not, at very least to start.

Is the idea that it should not be possible to use earlier definitions in snippets?

It should be possible to use anything that is in scope while defining the function.

What does "a where clause" mean? For instance, it does not make sense to attach a where clause to a data type definition, and there could be several clauses in a function definition.

Good point. I think that snippets should only be allowed on definitions, and they should be checked as follows:

-- > blerp A + 1 
blerp A = 1
blerp B = 2

-- ... is checked as ...

blerp = blerpDef
  where
    blerpDef A = 1
    blerpDef B = 2
    _ = blerpDef A + 1

However, maybe your solution in #4988 would be a better alternative for interpreting snippets? We could check them as nameless definitions, at the end of the current module, with the body F (blerp A + 1) for a fresh postulate as described.

How would docstrings be type-set by, say, the LaTeX backend? Like plain comment text?

To start with, as plain comment text. Later, we can investigate converting the Markdown syntax to TeX and HTML, but we would likely incur a large number of extra dependencies by doing so.

If we add support for both docstrings and type-checked comments (Type-checked comments #4988), how should they interact? I think it makes sense to design the two features together.

I think that, to begin, this proposal should supersede #4988, keeping in mind that there is room for extending the current proposal in the future. For instance, I considered suggesting the syntax...

-- > 5 + 3
-- = 8

... to document computation, by checking it similar to your suggestion, ...

_ : 5 + 3 ≡ 8
_ = refl

Similar extensions could be designed for your previous suggestion which checks whether or not an expression is well-formed.

However, I purposefully didn't include these suggestions in the proposal to avoid over-complicating it.

Docstrings might provide a (partial) solution to issue Links to comments rather than code #3198: When there is a docstring we can link to the comment instead of the definition.

Seems like a solid plan to generate stable anchors for docstrings!

Sep 14 '21 13:09 wenkokke

Offtopic: It appears as if GitHub now includes the title when rendering issue links inside lists. I don't think this used to be the case. Is it possible to turn off this feature?

I don't know why you're getting this behaviour. Just referencing the issue as #5555 does not do this, as far as I can tell: #5555.

Sep 14 '21 13:09 wenkokke

I don't know why you're getting this behaviour. Just referencing the issue as #5555 does not do this, as far as I can tell: #5555.

Try to put the link inside a list:

#5555.

Sep 14 '21 14:09 nad

Good point. I think that snippets should only be allowed on definitions, and they should be checked as follows:
-- > blerp A + 1
blerp A = 1
blerp B = 2

-- ... is checked as ...

blerp = blerpDef
  where
    blerpDef A = 1
    blerpDef B = 2
    _ = blerpDef A + 1

A problem with this approach is that blerp <neutral> reduces to <generated name> <neutral>.

I think that, to begin, this proposal should supersede #4988, keeping in mind that there is room for extending the current proposal in the future.

Some differences between the proposals:

The current proposal has code blocks, the other one inline code.
The current proposal supports unchecked code.
The other proposal (hopefully) supports use of generalisable variables.

If you implement this proposal, then I suspect that it would be straightforward to add support for inline checked, and perhaps also unchecked, code in the future. We could perhaps use lhs2TeX notation: |checked| and @unchecked@. I guess CommonMark does not support syntax highlighting for inline code snippets, but one could just wrap the code in backticks (unless the code contains one or more backticks, but then one can use code like ```code with `` two consecutive backticks```).

Sep 14 '21 15:09 nad

Well, I like docstrings well enough, but markdown (even commonmark) seems kind of bare-bones, and would more-or-less innevitably begin to accumulate "extensions", likely agda-specific ones, and since markdown wasn't designed to be extensible, these would have their own idiosynchratic parsing rules. (I suppose this might be less work than designing an idiosynchratic format from scratch, but is the resulting format any easier to use?)

Regardless, you don't even need a docstring format to begin work on such functionality: simply being able to format the signatures of stuff exported from each top-level module into an html file would be useful in its own right: Agda (like most dependently-typed languages, I guess?) has ample facilities for abbreviation, which can sometimes make it difficult to be certain what implicit parameters/fields a given entity has.

Aug 03 '23 22:08 SamB

Agda outsider/newbie comment (perhaps reflecting misunderstandings at some points): I love this proposal. The idea of including type checking in documentation processing helps to avoid docs that don't make sense, and probably makes it easier to prevent documentation that is out of sync with the code it describes. Using Markdown or something else seems valuable to me, too.

Question: Would this proposal mean if the code in a file doesn't type check, then new documentation won't be accessible in a generated file or in Emacs? During development, it can be useful to document intended behavior even when the code is still broken. Not sure if this is a real problem, as there are ways to kludge code so it type checks.

I also wonder it could make sense to accept an intermediate proposal could be accepted before the details of this one are agreed upon. Seems like it would be good to have a convention e.g. simply to to use -- | followed by -- on subsequent lines, whether or not code snippets are type checked, and whether or not full Markdown is allowed. I don't think it's even necessary to support something like Mathjax for a subset of LaTeX, since people already have ways to embed Unicode characters in source files.

My thought is that as long as an agreed upon proposal doesn't get in the way of subsequent refinement, it could be adopted earlier, with the idea that there's an intention to refine it further.

(I'm used to languages with much more primitive documentation facilities than what's proposed here, where there's been discussion and disagreement about how to improve formatting of documentation, but no agreement. The limitations of the documentation functionality has been frustrating, but the fact that functions are often documented and that it's trivial to find the documentation when it exists is very nice. As someone learning Agda, the lack of easy-to-find documentation for functions is sometimes frustrating.)

Apr 30 '24 18:04 mars0i