ctags icon indicating copy to clipboard operation
ctags copied to clipboard

Containerfile/Dockerfile parser

Open westurner opened this issue 1 year ago • 12 comments

STORY: Users can parse Dockerfile and Containerfile with universal-ctags in order to navigate and review with tool support.

  • man Containerfile: https://github.com/containers/common/blob/main/docs/Containerfile.5.md
  • List of Dockerfile instructions; e.g. 1 or more FROM, RUN, ARG, ENV,: https://docs.docker.com/reference/dockerfile/
  • There's a grammar for parsing docker image URLs: https://pkg.go.dev/github.com/distribution/reference#pkg-overview
  • There's an OCI Image manifest (JSON) spec: https://docs.docker.com/reference/dockerfile/
  • AFAIU there is not (yet?) an PEG grammar for Dockerfile / OCI Containerfile
  • FWIU there needn't be a PEG grammar to add something to parsers/?
  • FWIW here's Python's PEG grammar: https://docs.python.org/3/reference/grammar.html#full-grammar-specification
  • It's been awhile since I've written any C; I would have to llm it and only then write tests for my rusty c code.
    • https://github.com/universal-ctags/ctags/blob/master/Units/parser-iPythonCell.r/default-formats.d/input.py

westurner avatar Mar 27 '24 18:03 westurner

I have not read this issue well yet. Here is .ctags I wrote ago:

#
# containerfile.ctags --- regex parser for Containerfile and Dockerfile
#
#  Copyright (c) 2023, Red Hat, Inc.
#  Copyright (c) 2023, Masatake YAMATO
#
#  Author: Masatake YAMATO <[email protected]>
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301,
# USA.
#
# Reference: https://docs.docker.com/engine/reference/builder/
# 
--langdef=Containerfile
--map-Containerfile=+(Containerfile)
--map-Containerfile=+(Dockerfile)

--kinddef-Containerfile=a,arg,arguments
--kinddef-Containerfile=e,env,envorment variables
--kinddef-Containerfile=i,image,images
--_roledef-Containerfile.{image}=from,specfied in FROM

--regex-Containerfile=/^ARG[[:space:]]+([^[:space:]=]+)/\1/a/{exclusive}
--regex-Containerfile=/^ENV[[:space:]]+([^[:space:]=]+)/\1/e/{exclusive}
--regex-Containerfile=/^FROM[[:space:]]+(--[^[:space:]]*)?[[:space:]]+([^[:space:]]+)([[:space:]]+(as|AS)[[:space:]]+([^[:space:]]+))?//{exclusive}{{
    \2 /image /from _reftag _commit pop
    \5 false ne{
        \5 /image _tag _commit \2 inherits:
    } if
}}

This can be used as a start point.

masatake avatar Mar 27 '24 23:03 masatake

The main task of ctags is to extract names newly introduced in a target file. ctags extracts such names as definition tags. Though we have extended the task to extract names referenced or used, extracting definition tags is a higher priority.

I don' think RUN introduces a new name.

As far as reading https://www.tohoho-web.com/docker/dockerfile.html (Japanese), LABEL introduces names. So, the .ctags file should support it.

The critical issue is the .ctags doesn't support a command with multiple lines like:

ENV DB_HOST="192.168.2.201" \
    DB_PORT="3306" \
    DB_USER="myapp" \
    DB_PASSWD="ZbGc7#adG87GBfVC" \
    DB_DATABASE="sample"

To extract DB_PORT, DB_USER, ..., we must switch the multi-table meta parser (https://docs.ctags.io/en/latest/optlib.html#advanced-pattern-matching-with-multiple-regex-tables) from the line-oriented meta parser.

In my experience, A Containerfile is not very large. The performance of the parser may not be important, so a regex-based optlib parser is enough for the purpose.

Do you want to implement such a parser by yourself? I don't want to intend to take your joyful hacking time:-P

masatake avatar Mar 28 '24 03:03 masatake

Thanks. I can't commit to owning a parser like this; but here's this for parsing from https://docs.docker.com/reference/dockerfile/ :

$$('article table:first-of-type tr code').map((el) => el.innerText).reduce((a,b) => a + "\n" + b)
"ADD
ARG
CMD
COPY
ENTRYPOINT
ENV
EXPOSE
FROM
HEALTHCHECK
LABEL
MAINTAINER
ONBUILD
RUN
SHELL
STOPSIGNAL
USER
VOLUME
WORKDIR"

westurner avatar Mar 28 '24 04:03 westurner

I don't understand why you want to show all the commands. Ctags is not a general navigation tool. It focuses on definitions. We need a list of all commands that define names or introduce NEW names.

masatake avatar Mar 28 '24 05:03 masatake

#
# containerfile.ctags --- regex parser for Containerfile and Dockerfile
#
#  Copyright (c) 2023, 2024, Red Hat, Inc.
#  Copyright (c) 2023, 2024, Masatake YAMATO
#
#  Author: Masatake YAMATO <[email protected]>
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301,
# USA.
#
# Reference: https://docs.docker.com/engine/reference/builder/
# 
--langdef=Containerfile
--map-Containerfile=+(Containerfile)
--map-Containerfile=+(Dockerfile)

--kinddef-Containerfile=a,arg,arguments
--kinddef-Containerfile=e,env,envorment variables
--kinddef-Containerfile=i,image,images
--_roledef-Containerfile.{image}=from,specfied in FROM
--kinddef-Containerfile=l,label,labels

--_tabledef-Containerfile=main
--_tabledef-Containerfile=skipComment
--_tabledef-Containerfile=next
--_tabledef-Containerfile=arg
--_tabledef-Containerfile=env
--_tabledef-Containerfile=label

--_mtable-regex-Containerfile=skipComment/#[^\n]*//

--_mtable-regex-Containerfile=next/\\\n//{tleave}
--_mtable-regex-Containerfile=next/\n//{tleave}{_advanceTo=0start}
--_mtable-regex-Containerfile=next/[^\\\n]+//

--_mtable-extend-Containerfile=main+skipComment
--_mtable-regex-Containerfile=main/(ARG[ \t]+(\\\n)?|ARG\\\n)//{tenter=env}
--_mtable-regex-Containerfile=main/(ENV[ \t]+(\\\n)?|ENV\\\n)//{tenter=env}
--_mtable-regex-Containerfile=main/(LABEL[ \t]+(\\\n)?|LABEL\\\n)//{tenter=label}
--_mtable-regex-Containerfile=main/FROM[ \t]+(--[^ \t]*[ \t]+)?([^ \t\n]+)([  t]+(as|AS)[  \t]+([^ \t\n]+))?//{{
     \2 /image /from @2 _reftag _commit pop
     \5 false ne{
         \5 /image @5 _tag _commit \2 inherits:
     } if
}}
--_mtable-regex-Containerfile=main/[^\n]+//
--_mtable-regex-Containerfile=main/.//

--_mtable-regex-Containerfile=arg/[ \t]+//
--_mtable-regex-Containerfile=arg/([^[:space:]=]+)/\1/a/{tenter=next}
--_mtable-regex-Containerfile=arg/\n//{tleave}
--_mtable-regex-Containerfile=env/[ \t]+//
--_mtable-regex-Containerfile=env/([^[:space:]=]+)/\1/a/{tenter=next}
--_mtable-regex-Containerfile=env/\n//{tleave}
--_mtable-regex-Containerfile=label/[ \t]+//
--_mtable-regex-Containerfile=label/([^[:space:]=]+)/\1/a/{tenter=next}
--_mtable-regex-Containerfile=label/\n//{tleave}

masatake avatar Mar 28 '24 12:03 masatake

My use case for [universal-]ctags (#354) is vim-tagbar, which:

Tagbar is a Vim plugin that provides an easy way to browse the tags of the current file and get an overview of its structure. It does this by creating a sidebar that displays the ctags-generated tags of the current file, ordered by their scope. This means that for example methods in C++ are displayed under the class they are defined in.

(FWIW where tagbar doesn't get it, vim-voom [2] has Markdown and RST outline editing. I still have a custom config, but e.g. SpaceVim [3] has TagBar installed too)

[1] https://github.com/preservim/tagbar [2] https://github.com/vim-voom/VOoM/blob/master/doc/voom.txt [3] https://spacevim.org/use-vim-as-ide/

So IDK if just all of the tokens are worth indexing for Containerfile. RUN and ENTRYPOINT and HEALTHCHECK are probably significant enough tokens in the file to be useful for navigation with tagbar and similar for e.g. vscode.

westurner avatar Mar 28 '24 14:03 westurner

Buildah (Apache 2.0) has many Containerfile test cases:

  • https://github.com/containers/buildah/blob/main/tests/bud/multi-stage-builds/Dockerfile.arg_in_stage
  • https://github.com/containers/buildah/blob/main/tests/bud/multi-stage-builds/Dockerfile.extended

westurner avatar Mar 28 '24 14:03 westurner

jupyter-docker-stacks Dockerfiles aren't that long because they extend FROM other Dockerfile, but as far as demonstrating the utility of tagbar+ctags with a useful Dockerfile, there's docker-stacks-foundation/Dockerfile which specifies the e.g. NB_USER arg and so on: https://jupyter-docker-stacks.readthedocs.io/en/latest/ https://github.com/jupyter/docker-stacks/blob/main/images/docker-stacks-foundation/Dockerfile

What's a better example of a gnarly Dockerfile where this functionality will be helpful?

westurner avatar Mar 28 '24 14:03 westurner

Regarding languages for Documentation, we violate the principle of "making a tag for definition."

However, about Cotainerfile/Dockerfile, I want to uphold the principle. If I introduce a parser for the languages, the parser may only extract definitions. If you want to make tags for objects other than definitions, extend the built-in parser with ---regex-Containerfile=... options in your .ctags.

https://github.com/containers/buildah/blob/main/tests/bud/multi-stage-builds/Dockerfile.extended

This Dockerfile is quite a good example. Thank you.

I am surprised at ENV "BUILD_LOGLEVEL"="5". The left-side variable is surrounded by double-quote characters.

FROM is used more than once. An image name specified at FROM/AS is a scope for ENV, ARG, and LABEL. If only FROM is used, the parser must generate an image name to fill the scope fields of ENV, ARG, and LABEL.

https://docs.podman.io/en/stable/markdown/podman-build.1.html

Podman-build runs CPP. Therefore, #define DEF may appear in a Container file. ctags should extract DEF as a CPP macro.

Can we satisfy these requirements with .ctags? To get the answer to this question, I will implement the parser by myself.

masatake avatar Mar 28 '24 15:03 masatake

  • [ ] A PEG and/or EBNF grammar for Containerfile / Dockerfile would probably make the job much easier. (Other projects could generate Dockerfile parsers in whatever language from such a grammar.)

Isn't it possible to ~ distill such a grammar from a number of examples, such as the already-reference buildah and docker container builder test cases? Podman builds containers with Buildah. Nerdctl and Docker > 23.0 build containers with BuildKit.

BuildKit; where are the Dockerfile syntax examples tested by BuildKit?:

  • https://github.com/moby/buildkit/blob/master/Dockerfile (453 lines; worth navigating with ctags (and tagbar or similar))
  • https://github.com/moby/buildkit/blob/b3d57264721b7ba42895c22c1d5040525d4e32fe/Makefile#L42
  • https://github.com/moby/buildkit/blob/master/hack/test
  • https://github.com/moby/buildkit/tree/master/hack/dockerfiles
  • https://github.com/moby/moby/blob/master/Dockerfile
  • https://github.com/mbentley/docker-buildkit-tests

Buildah's test Dockerfiles appear to be the most complete set of test Dockerfiles / Containerfiles I'm aware of.

  • https://github.com/topics/peg
  • https://www.google.com/search?q=grammar+learning+from+corpora+site%253Agithub.com

westurner avatar Mar 30 '24 05:03 westurner

Weeks later, FWIW, there's probably already regex-based syntax highlighting for Dockerfile

  • https://meta.stackoverflow.com/questions/403721/dockerfile-syntax-highlighting
    • [ ] SO lacks dockerfile highlight.js support
  • https://github.com/codemirror/codemirror5/blob/master/mode/dockerfile/dockerfile.js
  • https://github.com/microsoft/vscode/issues/95728
    • https://github.com/moby/moby/issues/40841#issuecomment-617029507
      • https://github.com/moby/moby/blob/master/contrib/syntax/textmate/Docker.tmbundle/Syntaxes/Dockerfile.tmLanguage
      • https://github.com/moby/moby/blob/master/contrib/syntax/vim/README.md
      • https://github.com/vim/vim/blob/master/runtime/syntax/dockerfile.vim

westurner avatar Apr 11 '24 18:04 westurner

It focuses on definitions. We need a list of all commands that define names or introduce NEW names.

FROM X AS Y is what you want to consider. There are implicit numerical names given.

I do not see a great value in trying to add references from commands like COPY, ADD, MOUNT, RUN to other layers. Having a way to navigate between FROM is more than enough. Tags support for Dockerfile seems fairly useless imho and just adds complexity to the tooling with few real use cases.

Example


FROM alpine:latest

# RUN ..

FROM debian:latest

# Here we copy from stage 0 (not given a symbolic name automatic name is 0)
COPY --from=0 /build/artifact /usr/local/bin
ENTRYPOINT ["/usr/local/bin/artifact"] 

hholst80 avatar May 12 '24 13:05 hholst80