buildah icon indicating copy to clipboard operation
buildah copied to clipboard

buildah bud doesn't use cached layers when combined with multi stage build and --label somelabel=somevalue

Open Romain-Geissler-1A opened this issue 1 year ago • 7 comments

Description

While testing one of our OCI build tool against both docker and podman, I noticed that the tests of my tool are one order of magnitude slower on podman rather than docker. Trying to find a minimal reproducer, it seems related to multi-stage builds combined with labels passed on the command line (if labels are directly written in the containerfile, it's ok). In the following reproducer, I expect the layer with sleep 3 to be cached after a first run, but it is not, leading to always rebuilding images completely. In my real case, this sleep 3 is actually a dnf install command.

Steps to reproduce the issue:

  1. Start a container with the very latest buildah: rgeissler@ncerndobedev6097:~> podman run -t -i --rm --privileged --pull=always quay.io/buildah/upstream
  2. Then create a minimal containerfile:
[root@eb194bf8dfa3 /]# mkdir /build-context
[root@eb194bf8dfa3 /]# cat > /build-context/Dockerfile <<END_OF_DOCKERFILE
FROM fedora AS useless_stage_1
FROM useless_stage_1 AS stage_2

RUN sleep 3
END_OF_DOCKERFILE
  1. Build this image once, enable layer caching, and with an explicit label provided on the command line. It builds fine, doing a sleep 3:
[root@eb194bf8dfa3 /]# buildah bud --layers=true --label somelabel=somevalue /build-context/
[1/2] STEP 1/1: FROM fedora AS useless_stage_1
Resolved "fedora" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf)
Trying to pull registry.fedoraproject.org/fedora:latest...
Getting image source signatures
Copying blob ad5077952f52 done   |
Copying config 919a420d29 done   |
Writing manifest to image destination
--> 5c4fc42cf62f
[2/2] STEP 1/3: FROM 5c4fc42cf62ffa7e1649f932ad3b90bb0f39739531291fd204b2021007f3a4ce AS stage_2
[2/2] STEP 2/3: RUN sleep 3
--> 999f19301ff6
[2/2] STEP 3/3: LABEL "somelabel"="somevalue"
[2/2] COMMIT
--> f8c8d0a71f64
f8c8d0a71f64f6c14e46c0de8f61bcabfd4c39a71ea8b99cdd26ccaeda765c56
  1. Re-run the exact same command, hoping that this time cache is used and sleep 3 is not run. However it's not using cached layer for the sleep command:
[root@eb194bf8dfa3 /]# buildah bud --layers=true --label somelabel=somevalue /build-context/
[1/2] STEP 1/1: FROM fedora AS useless_stage_1
--> b075ebbee3a6
[2/2] STEP 1/3: FROM b075ebbee3a6450ba3eebbc14319cfe187e4c98dc47c4cb4facb0be7ba798795 AS stage_2
[2/2] STEP 2/3: RUN sleep 3
--> a7223a7828a9
[2/2] STEP 3/3: LABEL "somelabel"="somevalue"
[2/2] COMMIT
--> 61615826c9b7
61615826c9b7a5091f111b15e22a9ad3178ae565ea6b3a65f8f96f4516eae490
  1. Re-running the same build twice, but without explicit --label flag on the command line uses a cached build during the second invokation:
[root@eb194bf8dfa3 /]# buildah bud --layers=true /build-context/
[1/2] STEP 1/1: FROM fedora AS useless_stage_1
--> 919a420d29c6
[2/2] STEP 1/2: FROM 919a420d29c6f5ae0bdc8d1872387d3a878d7f69debce9e24f3f2e0506b2ba0d AS stage_2
[2/2] STEP 2/2: RUN sleep 3
[2/2] COMMIT
--> f5684459425d
f5684459425d3f88134b202c6dece0c0e2a2b91860d3bc29a536ad5b2c138e7a
[root@eb194bf8dfa3 /]# buildah bud --layers=true /build-context/
[1/2] STEP 1/1: FROM fedora AS useless_stage_1
--> 919a420d29c6
[2/2] STEP 1/2: FROM 919a420d29c6f5ae0bdc8d1872387d3a878d7f69debce9e24f3f2e0506b2ba0d AS stage_2
[2/2] STEP 2/2: RUN sleep 3
--> Using cache f5684459425d3f88134b202c6dece0c0e2a2b91860d3bc29a536ad5b2c138e7a
--> f5684459425d
f5684459425d3f88134b202c6dece0c0e2a2b91860d3bc29a536ad5b2c138e7a
  1. Adding direction the LABEL inside the original containerfile results in the build using cache correctly immediately at the first run:
[root@eb194bf8dfa3 /]# echo 'LABEL "somelabel"="somevalue"' >>/build-context/Dockerfile
[root@eb194bf8dfa3 /]# buildah bud --layers=true /build-context/
[1/2] STEP 1/1: FROM fedora AS useless_stage_1
--> 919a420d29c6
[2/2] STEP 1/3: FROM 919a420d29c6f5ae0bdc8d1872387d3a878d7f69debce9e24f3f2e0506b2ba0d AS stage_2
[2/2] STEP 2/3: RUN sleep 3
--> Using cache f5684459425d3f88134b202c6dece0c0e2a2b91860d3bc29a536ad5b2c138e7a
--> f5684459425d
[2/2] STEP 3/3: LABEL "somelabel"="somevalue"
[2/2] COMMIT
--> 2d4b8f439951
2d4b8f43995156779b1d852b08f419347d9607e72a77d8e35d1f2c3d80ea8f49

Describe the results you received:

Layer cache is not always used.

Describe the results you expected:

In the above described scenarios, I would expect that layer cache is always used.

Output of rpm -q buildah or apt list buildah:

buildah-1.31.0-1.20230731174246479315.main.43.g8af2dc4ea.x86_64

Output of buildah version:

Version:         1.32.0-dev
Go Version:      go1.20.6
Image Spec:      1.1.0-rc.4
Runtime Spec:    1.1.0
CNI Spec:        1.0.0
libcni Version:
image Version:   5.27.0-dev
Git Commit:
Built:           Mon Jul 31 17:47:23 2023
OS/Arch:         linux/amd64
BuildPlatform:   linux/amd64

Output of cat /etc/*release:

Fedora release 38 (Thirty Eight)
NAME="Fedora Linux"
VERSION="38 (Container Image)"
ID=fedora
VERSION_ID=38
VERSION_CODENAME=""
PLATFORM_ID="platform:f38"
PRETTY_NAME="Fedora Linux 38 (Container Image)"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:38"
DEFAULT_HOSTNAME="fedora"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f38/system-administrators-guide/"
SUPPORT_URL="https://ask.fedoraproject.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=38
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=38
SUPPORT_END=2024-05-14
VARIANT="Container Image"
VARIANT_ID=container
Fedora release 38 (Thirty Eight)
Fedora release 38 (Thirty Eight)

Output of uname -a:

Linux eb194bf8dfa3 5.14.0-162.6.1.el9_1.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Sep 30 07:36:03 EDT 2022 x86_64 GNU/Linux

Romain-Geissler-1A avatar Aug 01 '23 13:08 Romain-Geissler-1A

Note: in my case I have this issue in real life, using podman 4.4. from RHEL 9.

Romain-Geissler-1A avatar Aug 01 '23 13:08 Romain-Geissler-1A

Thanks for reporting I can reproduce this issue.

flouthoc avatar Aug 09 '23 09:08 flouthoc

@Romain-Geissler-1A Issue is happening because history is being created incorrectly for the first stage, meanwhile till I diagnose the root cause and create a patch for this a workaround is

FROM fedora AS useless_stage_1
RUN echo "dummy stmt"

FROM useless_stage_1 AS stage_2
RUN sleep 3

flouthoc avatar Aug 09 '23 12:08 flouthoc

Thanks for the hint.

Actually I had already updated my build tool (it's a tool that generates Dockerfile automatically, that's why it sometimes generate degenerated cases like this) so that we no longer pass --label flags on the command line, but we directly write LABEL statements directly in the generated dockerfile.

Romain-Geissler-1A avatar Aug 09 '23 12:08 Romain-Geissler-1A

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] avatar Sep 09 '23 00:09 github-actions[bot]

@flouthoc any update?

rhatdan avatar Sep 09 '23 08:09 rhatdan

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] avatar Oct 11 '23 00:10 github-actions[bot]