buildkit Proposal: support multiple formats for syntax directives

See #2812 for context, and #2904 for the related proposal regarding compatibility with shebang lines in Dockerfiles.

This proposal is presented as code to give a clearer view of what an implementation may look like, however, this is unlikely to be the final implementation. The code introduces basic support for two additional ways of syntax, so that the full list of possibilities is:

#syntax=foo, for script-style comments supported in most scripting languages, and Dockerfiles
//syntax=foo, for C-style comments supported in C/C++ and javascript/typescript
{"syntax": "foo", ...}, for JSON documents

(Note that XML support isn't included, and that JSON document support is included as a demonstration of how we could expand this support in the future).

The code is slightly more complex than it appears it needs to be - this is to handle a couple of constraints:

The default directive style should remain as #key=value
Mix-and-matching different directive styles should not be supported
Additionally allow easy extension in the future for other directive styles

Importantly, this does not change the implementation of how Dockerfiles parse their directives, these are defined in a separate regex that remains unmodified.

Please share your thoughts :tada:

Jun 29 '22 14:06 jedevc

I think what you're describing is how it works currently. This directive parsing code modified is only used for extracting the syntax directive, the directive parsing used to actually parse the dockerfile is located here. The utilities in directives.go are only used to determine to forward to a gateway.

The fallback behavior is also as you describe - we run through the directive parsers one-by-one, attempting to extract a syntax key from each - until we find one. If none are found, we can just give up and assume that it's the default # and parse directives from there.

Jul 19 '22 09:07 jedevc

I think what you're describing is how it works currently.

With the fallback ParseDirectives() should stay as it is currently(you can optimize regexp). Only in DetectSyntax() when ParseDirectives() call doesn't return a syntax directive it should try if the first line is //, skip shebang if needed, try json etc.

Jul 19 '22 16:07 tonistiigi

Alright sounds good - I think I was confused since DetectSyntax seems to be the only thing that ever calls the ParseDirectives function, but I suppose there could be external callers we'd want to maintain compatibility for.

Jul 19 '22 17:07 jedevc

Have done another pass through this, ParseDirectives now only parses # directives, while DetectSyntax searches through all the supported possibilities. I've also added a commit to skip shebang lines for #-style comments, I'm not sure if we want them for the other styles, since they won't be valid comments in the language that uses them, so I'm inclined to not allow it (or to wait until we get a valid use case).

I wonder if we might consider extending ParseDirectives in a follow-up? I do see a couple of codebases using it, so it might be nice to allow users to specify a syntax parameter for something like ParseDirectivesForSyntax to parse for multiple languages.

Jul 25 '22 10:07 jedevc

Have completely rewritten the code to be significantly less clever, so the DetectSyntax function should be much easier to read from top-to-bottom.

In addition to adding the shebang, c-style comments and json parsing, I've also moved the appropriate functionality into the parser package and exposed the directives parser properly, so that we can get identical behavior to how the directives parsed inside the parser (which was actually different before, e.g. doing different casing, etc.)

Nov 15 '22 13:11 jedevc

buildkit buildkit copied to clipboard

Proposal: support multiple formats for syntax directives

buildkit
buildkit copied to clipboard