buildah icon indicating copy to clipboard operation
buildah copied to clipboard

Reduce --label arguments to a single layer

Open cbandy opened this issue 2 years ago • 15 comments

What type of PR is this?

/kind api-change

What this PR does / why we need it:

#3517 fixed #3370 by adding a step/layer for every --label argument which matches the behavior of Docker build. Buildkit, however, does not add any layer for --label arguments.

Rather than clutter an image with empty layers, we can combine label arguments in a single layer.

How to verify it

Compare different tools using the following Dockerfile and arguments.

FROM alpine
LABEL \
  file1=one \
  file2=two
# Without this PR
$ buildah build --layers --tag cbandy:latest --label arg1=one --label arg2=two .
STEP 1/4: FROM alpine
STEP 2/4: LABEL   file1=one   file2=two
--> 031d0e5ee40
STEP 3/4: LABEL "arg1"="one"
--> c35ed1cd582
STEP 4/4: LABEL "arg2"="two"
COMMIT cbandy:latest
--> 8dc4f003578
Successfully tagged localhost/cbandy:latest
8dc4f0035785abfe7d75ac683b0ce21a1b1f020bc58951dff2b21472ee78a907

$ podman history cbandy:latest
ID            CREATED        CREATED BY                                     SIZE        COMMENT
e66264b98777  3 seconds ago  /bin/sh -c #(nop) LABEL "arg2"="two"           0 B         FROM c35ed1cd5823
<missing>     3 seconds ago  /bin/sh -c #(nop) LABEL "arg1"="one"           0 B         FROM 031d0e5ee405
<missing>     3 seconds ago  /bin/sh -c #(nop) LABEL   file1=one   file...  0 B         FROM docker.io/library/alpine:latest
<missing>     5 days ago     /bin/sh -c #(nop)  CMD ["/bin/sh"]             0 B         
<missing>     5 days ago     /bin/sh -c #(nop) ADD file:8e81116368669ed...  5.81 MB
# With this PR
$ buildah build --layers --tag cbandy:latest --label arg1=one --label arg2=two .
STEP 1/3: FROM alpine
STEP 2/3: LABEL   file1=one   file2=two
--> 56fc28a1e56
STEP 3/3: LABEL "arg1"="one" "arg2"="two"
COMMIT cbandy:latest
--> d2418033201
Successfully tagged localhost/cbandy:latest
d2418033201681ccc30b7a3bce2c2f70a6e7816c5867c8d38f7ce602f7bda20e

$ podman history cbandy:latest
ID            CREATED        CREATED BY                                     SIZE        COMMENT
e66264b98777  4 seconds ago  /bin/sh -c #(nop) LABEL "arg1"="one" "arg2...  0 B         FROM 56fc28a1e566
<missing>     4 seconds ago  /bin/sh -c #(nop) LABEL   file1=one   file...  0 B         FROM docker.io/library/alpine:latest
<missing>     5 days ago     /bin/sh -c #(nop)  CMD ["/bin/sh"]             0 B         
<missing>     5 days ago     /bin/sh -c #(nop) ADD file:8e81116368669ed...  5.81 MB
$ docker buildx build --tag cbandy:latest --label arg1=one --label arg2=two .
…

$ docker history cbandy:latest
IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
ef9cbb8f3175        5 days ago          LABEL file1=one file2=two                       0B                  buildkit.dockerfile.v0
<missing>           5 days ago          /bin/sh -c #(nop)  CMD ["/bin/sh"]              0B                  
<missing>           5 days ago          /bin/sh -c #(nop) ADD file:8e81116368669ed3d…   5.52MB

Which issue(s) this PR fixes:

None

cbandy avatar May 29 '22 16:05 cbandy

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: cbandy To complete the pull request process, please assign flouthoc after the PR has been reviewed. You can assign the PR to them by writing /assign @flouthoc in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar May 29 '22 16:05 openshift-ci[bot]

@nalind WDYT?

rhatdan avatar Jun 01 '22 20:06 rhatdan

Labels are unordered in the image config, so I don't really have a problem with us running sort.Strings() on options.Labels before iterating through it to put them into a predictable order on the line that we synthesize. I don't know that we've ever tested with more than one --label option on the buildah build command line, or that the build cache currently still gets a "hit" if the order of multiple --label options on the buildah build command line is changed, so if the tests don't complain, I don't think it breaks anything.

nalind avatar Jun 01 '22 20:06 nalind

Could you let us know the issues caused by the existing behavior.

I like that --label arguments allow me to annotate images without having to go through --build-arg and ARG and LABEL instructions. I like that BuildKit and buildah v1.23.0 do so without adding to the history.

I was bitten by #3370 a few times and appreciate that was fixed in buildah v1.24.0. But now the history of my images is full of LABEL instructions.

I think one problem is that we will miss cache hit if order of label is changed

If I understand correctly, this is true both before and after this PR. Note the args and missing cache messages in the third invocation:

# Without this PR
$ buildah build --layers --tag cbandy:latest --label arg1=one --label arg2=two .
…

$ buildah build --layers --tag cbandy:latest --label arg1=one --label arg2=two .
STEP 1/4: FROM alpine
STEP 2/4: LABEL   file1=one   file2=two
--> Using cache 56fc28a1e5669b7629007a01bd794fffa1fc1382143c0c852868dbb171bef50a
--> 56fc28a1e56
STEP 3/4: LABEL "arg1"="one"
--> Using cache 8c8cd32262c81cfccc989fea54f330817f08e2d2cc37bd583e3aaccf083afe30
--> 8c8cd32262c
STEP 4/4: LABEL "arg2"="two"
--> Using cache c4932f6355b9378abe448ca8d8a02cf972903baeb22d1763df7a3193fabb83da
COMMIT cbandy:latest
--> c4932f6355b
Successfully tagged localhost/cbandy:latest
c4932f6355b9378abe448ca8d8a02cf972903baeb22d1763df7a3193fabb83da

$ buildah build --layers --tag cbandy:latest --label arg2=two --label arg1=one .
STEP 1/4: FROM alpine
STEP 2/4: LABEL   file1=one   file2=two
--> Using cache 56fc28a1e5669b7629007a01bd794fffa1fc1382143c0c852868dbb171bef50a
--> 56fc28a1e56
STEP 3/4: LABEL "arg2"="two"
--> dc36ade8ba4
STEP 4/4: LABEL "arg1"="one"
COMMIT cbandy:latest
--> 821ecad1fde
Successfully tagged localhost/cbandy:latest
821ecad1fde81ae39f46c70ee826805060d6489c3b21db906f72e42b1dcf05f3

with this PR the behavior of buildah differs from both buildkit and docker.

Agreed.

Looking more closely at the output of docker build, I see buildah build still differs with regard to caching. Regardless of the argument order docker build prints:

Step 3/4 : LABEL arg1=one
 ---> Using cache
 ---> a37a055bdea2
Step 4/4 : LABEL arg2=two
 ---> Using cache
 ---> 316a2a04c383

cbandy avatar Jun 02 '22 02:06 cbandy

Labels are unordered in the image config, so I don't really have a problem with us running sort.Strings() on options.Labels before iterating through it to put them into a predictable order

Should the code in this PR sort options.Labels in-place, or make a copy then sort, or should something else sort the slice long before here?

I don't know that we've ever tested with more than one --label option on the buildah build command line, or that the build cache currently still gets a "hit" if the order of multiple --label options on the buildah build command line is changed

I'm not familiar with the test suite, but I also don't see any in bud.bats. 🧐 It looks like bud-multistage-relabel could change a bit to cover this.

cbandy avatar Jun 02 '22 03:06 cbandy

Labels are unordered in the image config, so I don't really have a problem with us running sort.Strings() on options.Labels before iterating through it to put them into a predictable order

Should the code in this PR sort options.Labels in-place, or make a copy then sort, or should something else sort the slice long before here?

I lean toward making a copy in the same place, and sorting it there. We copy the slice out of the options struct in newExecutor(), and that copy is what the build cache has on-hand when it's doing its thing, so I'd prefer it if they didn't end up in different orders.

I don't know that we've ever tested with more than one --label option on the buildah build command line, or that the build cache currently still gets a "hit" if the order of multiple --label options on the buildah build command line is changed

I'm not familiar with the test suite, but I also don't see any in bud.bats. monocle_face It looks like bud-multistage-relabel could change a bit to cover this.

Agreed, that looks like a good place to do it.

nalind avatar Jun 02 '22 17:06 nalind

We copy the slice out of the options struct in newExecutor(), and that copy is what the build cache has on-hand when it's doing its thing, so I'd prefer it if they didn't end up in different orders.

I see where you're talking about, but I'm getting a bit out of my depth. I see other slices that I suspect are also key/value pairs. Would annotations and envs need the same treatment? Perhaps someone else is better equipped to address caching separately from this PR about history?

https://github.com/containers/buildah/blob/fb28761ceb01e3e9050d3567bac721dabdb5953d/imagebuildah/executor.go#L251-L252

https://github.com/containers/buildah/blob/fb28761ceb01e3e9050d3567bac721dabdb5953d/imagebuildah/executor.go#L284-L288

cbandy avatar Jun 03 '22 05:06 cbandy

A friendly reminder that this PR had no activity for 30 days.

github-actions[bot] avatar Jul 04 '22 00:07 github-actions[bot]

@flouthoc any chance you can work on this?

rhatdan avatar Jul 05 '22 11:07 rhatdan

@cbandy are you still working on this ? If you are fine I can take over this PR and include your commit.

flouthoc avatar Jul 05 '22 11:07 flouthoc

@cbandy are you still working on this ? If you are fine I can take over this PR and include your commit.

@flouthoc I have nothing new for this. Please feel free.

cbandy avatar Jul 05 '22 19:07 cbandy

@cbandy: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci[bot] avatar Jul 07 '22 22:07 openshift-ci[bot]

A friendly reminder that this PR had no activity for 30 days.

github-actions[bot] avatar Aug 07 '22 00:08 github-actions[bot]

@flouthoc any time to work on this?

rhatdan avatar Aug 07 '22 10:08 rhatdan

Yes I'll cherry-pick the commits.

flouthoc avatar Aug 08 '22 10:08 flouthoc

A friendly reminder that this PR had no activity for 30 days.

github-actions[bot] avatar Nov 16 '22 00:11 github-actions[bot]

@flouthoc Did you ever get a chance to look at this?

rhatdan avatar Nov 18 '22 20:11 rhatdan

A friendly reminder that this PR had no activity for 30 days.

github-actions[bot] avatar Dec 20 '22 00:12 github-actions[bot]

A friendly reminder that this PR had no activity for 30 days.

github-actions[bot] avatar Jan 20 '23 00:01 github-actions[bot]

I think @nalind has already implemented this here https://github.com/containers/buildah/pull/4511, hence closing this please re-open if i am mistaken.

flouthoc avatar Jan 20 '23 04:01 flouthoc