Decide on syntax for Lua doc comments

Open wincent opened this issue 5 months ago • 1 comments

We need a syntax that doesn't clash with standard Lua doc comments (ie. as used by lua-language-server).

These start with a triple dash (---) and can contain annotations (eg. @param).

  --- @param name string
  --- @paramz 1
  --- test here
  ---
  --- |link-to-thing|

Note that Neovim highlights anything that looks like an annotation, even if invalid (eg. @paramz). See also how language-lua-server is looking at the contents of the comments and using them to power its static analysis features (and report problems via line diagnostics):

Also note how links (eg. |link-to-thing|) are highlighted, ~although I'm not sure why~... The lua-language-server docs say you can refer to other symbols using Markdown syntax:

---@alias MyCustomType integer

---Calculate a value using [my custom type](lua://MyCustomType)
function calculate(x) end

According to :InspectTree, this is just comment_content:

      (comment ; [496, 2] - [496, 21]
        content: (comment_content)) ; [496, 4] - [496, 21]

And, after pressing a (to reveal anonymous nodes) and I (to show language source):

      (comment ; [496, 2] - [496, 21] lua
        start: "--" ; [496, 2] - [496, 4] lua
        content: (comment_content)) ; [496, 4] - [496, 21] lua

But :Inspect reveals more detail:

Treesitter
  - @comment.lua links to Comment   priority: 100   language: lua
  - @spell.lua links to @spell   priority: 100   language: lua
  - @comment.documentation.lua links to Comment   priority: 100   language: lua

Semantic Tokens
  - @lsp.type.keyword.lua links to Keyword   priority: 125
  - @lsp.mod.documentation.lua links to @lsp   priority: 126
  - @lsp.typemod.keyword.documentation.lua links to Tag   priority: 127

Anyway, in order not to clash, here are some valid comment syntaxes:

-- this is a valid lua comment, but only a single line one.
--[ also good, also single-line.
--[= still good, but also single-line.
--[=[ not good unless paired with ]=]
--[=[ but note...
it is multiline...
-- can precede every line with `--` if you want
]=]

In the end, I think I like this the best:

--[[[

@mappings

# This is a heading

## A subheading...

### A sub-subheading

stuff in here...

#### More...

`ls` to see more...

```
console.log('code in here')
```

    for (let i = 0; i < 10; i++) {
      // hmmm..
      console.log("crap here");
      return false;
    }

> A block quote... More block quotes?

- This is a list.
- More list...

**THIS IS A WARNING.** I _emphasize_ this...

--]]

-- The only thing I don't like 👆 is that you end with `--]]` and not `--]]]`,
-- so it's not symmetrical...

--[[[

@option g:FerretLoaded any

To prevent Ferret from being loaded, set |g:FerretLoaded| to any value in your
|.vimrc|. For example:

```
let g:FerretLoaded=1
```
--]]

I attempted to get this highlighting the embedded Markdown via injections, but my current attempt only highlights **bold** stuff:

Something to do with :help vim.hl.priorities:

vim.hl.priorities                                          *vim.hl.priorities*
    Table with default priorities used for highlighting:
    • `syntax`: `50`, used for standard syntax highlighting
    • `treesitter`: `100`, used for treesitter-based highlighting
    • `semantic_tokens`: `125`, used for LSP semantic token highlighting
    • `diagnostics`: `150`, used for code analysis such as diagnostics
    • `user`: `200`, used for user-triggered highlights such as LSP document
      symbols or `on_yank` autocommands

semantic_tokens defaults to 125, which means that you can see the Markdown highlighting (via Treesitter, which is priority 100) when you load the buffer, but then when the LSP is ready, it overrides the highlighting (setting comment as a "semantic token") with a high priority, and causing the Markdown highlighting to go away.

I tried hacking it to see what would happen:

vim.hl.priorities.semantic_tokens = 95

it ends up showing the more Markdown syntax even after the LSP has loaded, but it breaks the lua-language-server comments:

:InspectTree shows it being picked up as a heading:

        (section ; [505, 0] - [546, 4]
          (atx_heading ; [505, 0] - [506, 0]
            (atx_h1_marker) ; [505, 0] - [505, 3]
            heading_content: (inline ; [505, 4] - [505, 21]
              (inline))) ; [505, 4] - [505, 21]

And with a and I:

        (section ; [505, 0] - [546, 4] markdown
          (atx_heading ; [505, 0] - [506, 0] markdown
            (atx_h1_marker) ; [505, 0] - [505, 3] markdown
            heading_content: (inline ; [505, 4] - [505, 21] markdown
              (inline))) ; [505, 4] - [505, 21] markdown_inline

:Inspect:

Treesitter
  - @comment.lua links to Comment   priority: 100   language: lua
  - @spell.lua links to @spell   priority: 100   language: lua
  - @markup.heading.1.markdown links to Title   priority: 100   language: markdown
  - @spell.markdown links to @spell   priority: 100   language: markdown

Semantic Tokens
  - @lsp.type.comment.lua links to Comment   priority: 125

FWIW, did this experiment using this ~/.config/nvim/queries/lua/highlights.scm:

;; extends

; "@mappings" @keyword.mappings

; (#set! "priority" 200)
; [
;   "@mappings"
; ] @keyword @nospell

and this ~/.config/nvim/queries/lua/injections.scm:

;; extends

; Inject markdown into multiline comments that start with `--[[[` (note the
; extra `[`):
(comment
  content: (_) @injection.content
  (#lua-match? @injection.content "^%[")
  (#set! injection.language "markdown")
  (#offset! @injection.content 0 1 0 0)
  (#set! injection.combined)
  (#set! injection.include-children))

; problem, once LSP kicks in, it sets @lsp.type.comment.lua, linking to Comment
; with Priority 125 (sigh) "Semantic Tokens"

I think, I could make a custom LSP server that attaches to the buffer (at least, Claude says you can do that and you and Neovim will merge all tokens, additively) and provides its own semantic tokens, but that seems like boiling the ocean for something that should be doable much more simply...

Jul 17 '25 08:07 wincent

Ok, so I've concluded that:

I can't highlight stuff that the tree-sitter parser hasn't parsed into nodes.
Tree-sitter won't look inside comments, so there are no internal nodes to highlight.
Changing that would require creating a new parser, which would be a lot of work.
Even if I do create such a parser, the LSP server's semantic tokens will still win because they have higher priority.
Creating a custom LSP server may not be so much work, as I already have a Lua parser and a Markdown parser in the docvim project; I could spike out a prototype there.
If I can add semantic tokens from that LSP server, I could also have it provide autocompletions etc that would make writing documentation easier.
I think I'm going to give it a shot.

Good ol' Claude Code estimates:

⏺ 3-5 weeks of work. The project has excellent foundations - existing Lua and Markdown parsers with precise location tracking. Main tasks are adding LSP server boilerplate (tower-lsp), and building the comment extraction → Markdown parsing → semantic tokens pipeline.

The hard parsing work is already done. This would integrate cleanly with the existing architecture.

Jul 21 '25 15:07 wincent