tree-sitter-dockerfile Make language injections easier by making nodes anonymous

Without heredoc_line and shell_fragment being present in a tree it's much easier to add language injections, for example, with bash:

((shell_command) @injection.content
  (#set! injection.language "bash"))

((run_instruction
  (heredoc_block) @injection.content)
  (#set! injection.language "bash"))

With the way it's implemented right now it doesn't seem to be possible to do these kinds of injections correctly. There is a Neovim issue with a bit more details for shell_command injections https://github.com/nvim-treesitter/nvim-treesitter/issues/6530

Here are the injections used at the moment in nvim-treesitter:

((comment) @injection.content
  (#set! injection.language "comment"))

((shell_command
  (shell_fragment) @injection.content)
  (#set! injection.language "bash")
  (#set! injection.combined))

((run_instruction
  (heredoc_block) @injection.content)
  (#set! injection.language "bash")
  (#set! injection.include-children))

There doesn't seem to be a better way to "join" shell_fragment nodes than using injection.combined, which isn't quite great since it treats all of the scripts in the whole dockerfile as a single large script, breaking highlighting in a weird way. With these changes every RUN / CMD / ENTRYPOINT instruction can be injected independently, but without splitting it into different chunks between line_continuation nodes.

For heredoc, include-children is needed to highlight all heredoc_line nodes, but it also includes heredoc_end node, which isn't quite right. Injecting every heredoc as bash is not itself correct though, but that's a different issue I think.

Thanks for maintaining this project! :+1:

May 13 '24 08:05 faergeek

Playing with examples/1 I noticed one more thing. With code like this:

RUN apk update && apk add --no-cache \
    # some comment here
    bash

bash after the comment is treated as a command inside of an injection instead of an argument, since comment is consumed up until the newline (excluding newline itself), so effectively it is parsed by bash as:

apk update && apk add --no-cache \

    bash

Note the blank line there.

It should be interpreted like this though:

apk update && apk add --no-cache \
    bash

For that to work comment should also consume a newline. I tried adding a newline into regex there and all tests are still passing. I couldn't come up with any negative consequences of that yet, so it should probably be added as well.

May 13 '24 18:05 faergeek

See https://github.com/nvim-treesitter/nvim-treesitter/issues/6975 for a related issue. (The current setup makes it hard to parse multiline commands as bash, since we need to inject into fragments and combine them together -- and that can lead to false positives.)

Jul 24 '24 15:07 clason

Any progress on this?

Apr 05 '25 01:04 reteps

I've rebased on top of main and made comment consume a newline.

Not sure what else I need to do, but let me know if there is something. I've never worked with treesitter grammars before.

Apr 05 '25 03:04 faergeek

tree-sitter-dockerfile tree-sitter-dockerfile copied to clipboard

Make language injections easier by making nodes anonymous

tree-sitter-dockerfile
tree-sitter-dockerfile copied to clipboard