buildkit Multiline strings in dockerfiles (ARG, ENV, LABEL)

As brought up in #2132, it would be nice to allow multi-line strings in ARGs, ENVs, LABELs, etc in Dockerfiles.

One possible suggestion, using the new heredoc parser and syntax would be:

ENV ONE=<<EOF1
content of
env
EOF1

However, this is an expansion of the syntax - heredocs are currently used in places where files are expected, none of these cases are that. We could adapt the syntax slightly, so that a heredoc used as a variable requires the =<<EOF syntax, with the equals operator required - this would make the intention behind it clearer.

Alternatively, another suggestion could be to allow strings to continue onto newlines:

ENV ONE="
content of
env
"

This would actually be relatively similar to how interactive shells handle unclosed strings. We could allow this with all quote types, or mimic go and allow only backticks to be multiline like this.

Not quite sure which option is preferred, or even whether it's that important to even attempt to support (not sure how relevant a problem it is, I think the @thaJeztah might have some ideas).

Jun 12 '21 21:06 jedevc

Alternatively, another suggestion could be to allow strings to continue onto newlines:

I do like that it's less "verbose" (and somewhat more "natural"). Quick first thoughts would be that the downside could be that it also raises the expectation that the same will apply to RUN, i.e.

RUN echo "hello
world"

RUN export FOO="one
two
three"

But the above would still require using line-continuation symbols (as we don't parse the "content" of the RUN)

This would actually be relatively similar to how interactive shells handle unclosed strings.

Gave this a quick try, and that looks to work indeed (first thought it stripped the newlines, but that's echo / when printing);

/ # export FOO="one
> two
> three"
/ # env
HOSTNAME=d1cb361a0e64
SHLVL=1
HOME=/root
TERM=xterm
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
FOO=one
two
three
PWD=/

Jun 14 '21 08:06 thaJeztah

@tonistiigi any thoughts?

Jun 14 '21 08:06 thaJeztah

I'm quite firmly against the heredoc variant. That is not how heredocs work and it only adds confusion.

The quoted case is more consistent but it feels like it might be opening up a lot of new issues. Eg. then everything shell-like should support this for consistency. Meaning that for the RUN commands etc. we would need a full shell parser. There are also other things like comment line stripping that already behave differently from the shell rules.

Jun 14 '21 16:06 tonistiigi

I linked https://github.com/moby/moby/issues/38224 to this ticket (old feature request for multiline env-vars)

Nov 10 '21 15:11 thaJeztah

Coming from https://github.com/docker/cli/issues/3448 as pointed by @thaJeztah!

From my side, the use case for multi-line label comments is based on the "new" generation observability/monitoring tools for container-related workloads which run mostly out of auto-discovery based on container labels. This drives the labels to have to contain more complex data structures/annotations in order to facilitate the agent's configuration at scale since everything is autoscaled and must be auto-discovered rather than centrally declared.

Datadog for instance uses JSON content from the labels to embed configuration for how that container should be monitored...

e.g.:

Today:

FROM my.fancy.repo/redis:6

LABEL "com.datadoghq.ad.check_names"  = "[\"redisdb\"]"
LABEL "com.datadoghq.ad.init_configs" = "[{}]"
LABEL "com.datadoghq.ad.instances"    = "[{\"host\":\"%%host%%\",\"port\": \"%%port%%\", \"command_stats\": \"true\"}]"

...

Preferably, I would have a multi-line heredocs option that let's me declare the JSON with its original formatting, which makes it easier to maintain.

...
...
LABEL "com.datadoghq.ad.instances" = <<-EOF 
[{
"host": "%%host%%",
"port": "%%port%%",
"command_stats": "true"
}]
EOF
...

Note that this is a simple example, some JSON configurations have dozens of lines and multi-object structures. Maybe not at scope here but even having something like a file would help:

FROM my.fancy.repo/redis:6

LABEL "com.datadoghq..."  << "datadog/instrumentation.json"
...

I can't tell anymore how many times problems were related to malformed JSON and having to bisect to find when an error was introduced from a inline JSON is really annoying hehe

Mar 03 '22 16:03 marceloboeira

It does look like there are things that are really are wanting it for LABELs and such :eyes:

Having not thought about this for some time, I'm finding myself agreeing with @tonistiigi above - heredocs are quite good for RUNs and a bit hacky (but cool) for COPY but trying to get it to work with everything feels a little more hacky.

I kind of like the idea of trying the automatic quote continuation - it fits how sh does it, and handily, it shouldn't break anyone's compatibility: since it currently gives syntax errors:

FROM alpine:latest

LABEL "x"="test
test"

Am tempted to give this a prototype if I get some free time soon - some of the corner cases feel like they would be quite grim to handle, specifically around what it would look like with RUN.

Mar 03 '22 17:03 jedevc

I kind of like the idea of trying the automatic quote continuation

Question with that one (but likely the same with HERE DOC) would be; what should be used as newline, taking into account that the syntax would also at some point make its way into Dockerfiles for Windows images. (should it serialise with \n or \r\n?). (also wondering if it should take the #escape directive into account (I rarely use that one myself, so need to dig up where it's taken into account).

I recall there were various discussions around "what to support as labels", and in the end we decided to make it the author's responsibility to decide how to serialise "structured" information (such as JSON), and from docker's perspective it's "just a string".

Perhaps it's not an issue at all (if it's properly documented), but the devil can be in the details at times.

Mar 03 '22 20:03 thaJeztah

Finally got a spare moment :tada:

Conclusion from a prototype - this seems quite miserable to do correctly :smile: Newlines, escapes, etc, are all slightly fiddly to deal with, annoyingly there's also the problem of how interactions with the existing line-continuation symbol should be handled.

The main problem that this has is that logic for parsing is spread out - first off the dockerfile is split into logical "lines", and then each line is parsed separately. To do multiline strings properly, the logic for shell parsing somehow needs to make it into the line splitting logic - the precedent for this would be heredocs, though it would be entirely different logic. Possible to do... but a neat result would probably require some refactoring of the parser to avoid multiple sets of logic for handling quoting rules.

Weirdly, in playing around I've broken (at least) a couple of test cases: specifically TestParseCases/escapes and TestParseCases/env - these seem to only pass the initial stage of parsing, and would need to change if more shell stuff made it into the line parser.

I like this options over a weird heredocs-mishmash, but it doesn't seem incredible to me (as well as being a total pain to implement) - but not sure what other alternatives there might be for doing multiline labels/etc.

Mar 10 '22 09:03 jedevc

hey all - any interest in the heredoc approach? We'd find it incredibly valuable 🙏

Aug 01 '24 15:08 nsbradford

Would like to see this feature as well. Our use case would be to have small YAML configs embedded into the Dockerfile. I prefer to embed these inside the Dockerfile to be as explicit as possible

Aug 08 '24 08:08 ruzzle