buildah Environment --secrets broken when running buildah inside a container

Description

When running buildah bud inside a container, the --secret type=env option does not work. When the corresponding RUN --mount=type=secret,id=... command in the Dockerfile/Containerfile is executed, an error occurs.

If the container is started with --cap-add CAP_SYS_ADMIN, the build completes without an error. Of course, it would be better if such a high-risk capability were not required in order to use this feature.

I've included output from a AlmaLinux 9 VM running on WSL2, but I have observed exactly the same behavior on a RHEL8 box. The issue occurs regardless of whether the container is a podman container or a docker container, rootful or rootless. The CAP_SYS_ADMIN workaround works in all of these scenarios.

Steps to reproduce the issue:

# On the host (not running as root)
podman run --rm -it --device=/dev/fuse quay.io/buildah/stable:v1.26.4

# Within the container
export ENV_SECRET='abc123'
cat <<'EOF'  | buildah bud --secret id=ENV_SECRET,type=env -f - .
FROM alpine
RUN --mount=type=secret,id=ENV_SECRET echo "SECRET: $(cat /run/secrets/ENV_SECRET)"
EOF

Describe the results you received:

The RUN step fails with operation not permitted.

STEP 1/2: FROM alpine
STEP 2/2: RUN --mount=type=secret,id=ENV_SECRET echo "SECRET: $(cat /run/secrets/ENV_SECRET)"
error running subprocess: error remounting "/var/tmp/buildah1870670104/mnt/rootfs/run/secrets/ENV_SECRET" in mount namespace with expected flags: operation not permitted
error building at STEP "RUN --mount=type=secret,id=ENV_SECRET echo "SECRET: $(cat /run/secrets/ENV_SECRET)"": exit status 1

Describe the results you expected:

The build completes successfully.

Output of rpm -q buildah or apt list buildah:

buildah-1.26.4-2.fc36.x86_64

Output of buildah version:

Version:         1.26.4
Go Version:      go1.18.4
Image Spec:      1.0.2-dev
Runtime Spec:    1.0.2-dev
CNI Spec:        1.0.0
libcni Version:  v1.1.2
image Version:   5.22.0
Git Commit:
Built:           Mon Aug  8 14:11:10 2022
OS/Arch:         linux/amd64
BuildPlatform:   linux/amd64

Output of podman version if reporting a podman build issue:

Client:       Podman Engine
Version:      4.1.1
API Version:  4.1.1
Go Version:   go1.17.12
Built:        Tue Aug  9 07:01:20 2022
OS/Arch:      linux/amd64

Output of cat /etc/*release:

$ cat /etc/os-release
NAME="AlmaLinux"
VERSION="9.0 (Emerald Puma)"
ID="almalinux"
ID_LIKE="rhel centos fedora"
VERSION_ID="9.0"
PLATFORM_ID="platform:el9"
PRETTY_NAME="AlmaLinux 9.0 (Emerald Puma)"
ANSI_COLOR="0;34"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:almalinux:almalinux:9::baseos"
HOME_URL="https://almalinux.org/"
DOCUMENTATION_URL="https://wiki.almalinux.org/"
BUG_REPORT_URL="https://bugs.almalinux.org/"

ALMALINUX_MANTISBT_PROJECT="AlmaLinux-9"
ALMALINUX_MANTISBT_PROJECT_VERSION="9.0"
REDHAT_SUPPORT_PRODUCT="AlmaLinux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.0"

Output of uname -a:

Linux (hostname redacted) 5.10.16.3-microsoft-standard-WSL2 #1 SMP Fri Apr 2 22:23:49 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Output of cat /etc/containers/storage.conf:

(On the host)

# This file is is the configuration file for all tools
# that use the containers/storage library. The storage.conf file
# overrides all other storage.conf files. Container engines using the
# container/storage library do not inherit fields from other storage.conf
# files.
#
#  Note: The storage.conf file overrides other storage.conf files based on this precedence:
#      /usr/containers/storage.conf
#      /etc/containers/storage.conf
#      $HOME/.config/containers/storage.conf
#      $XDG_CONFIG_HOME/containers/storage.conf (If XDG_CONFIG_HOME is set)
# See man 5 containers-storage.conf for more information
# The "container storage" table contains all of the server options.
[storage]

# Default Storage Driver, Must be set for proper operation.
driver = "overlay"

# Temporary storage location
runroot = "/run/containers/storage"

# Primary Read/Write location of container storage
# When changing the graphroot location on an SELINUX system, you must
# ensure  the labeling matches the default locations labels with the
# following commands:
# semanage fcontext -a -e /var/lib/containers/storage /NEWSTORAGEPATH
# restorecon -R -v /NEWSTORAGEPATH
graphroot = "/var/lib/containers/storage"


# Storage path for rootless users
#
# rootless_storage_path = "$HOME/.local/share/containers/storage"

[storage.options]
# Storage options to be passed to underlying storage drivers

# AdditionalImageStores is used to pass paths to additional Read/Only image stores
# Must be comma separated list.
additionalimagestores = [
]

# Remap-UIDs/GIDs is the mapping from UIDs/GIDs as they should appear inside of
# a container, to the UIDs/GIDs as they should appear outside of the container,
# and the length of the range of UIDs/GIDs.  Additional mapped sets can be
# listed and will be heeded by libraries, but there are limits to the number of
# mappings which the kernel will allow when you later attempt to run a
# container.
#
# remap-uids = 0:1668442479:65536
# remap-gids = 0:1668442479:65536

# Remap-User/Group is a user name which can be used to look up one or more UID/GID
# ranges in the /etc/subuid or /etc/subgid file.  Mappings are set up starting
# with an in-container ID of 0 and then a host-level ID taken from the lowest
# range that matches the specified name, and using the length of that range.
# Additional ranges are then assigned, using the ranges which specify the
# lowest host-level IDs first, to the lowest not-yet-mapped in-container ID,
# until all of the entries have been used for maps.
#
# remap-user = "containers"
# remap-group = "containers"

# Root-auto-userns-user is a user name which can be used to look up one or more UID/GID
# ranges in the /etc/subuid and /etc/subgid file.  These ranges will be partitioned
# to containers configured to create automatically a user namespace.  Containers
# configured to automatically create a user namespace can still overlap with containers
# having an explicit mapping set.
# This setting is ignored when running as rootless.
# root-auto-userns-user = "storage"
#
# Auto-userns-min-size is the minimum size for a user namespace created automatically.
# auto-userns-min-size=1024
#
# Auto-userns-max-size is the minimum size for a user namespace created automatically.
# auto-userns-max-size=65536

[storage.options.overlay]
# ignore_chown_errors can be set to allow a non privileged user running with
# a single UID within a user namespace to run containers. The user can pull
# and use any image even those with multiple uids.  Note multiple UIDs will be
# squashed down to the default uid in the container.  These images will have no
# separation between the users in the container. Only supported for the overlay
# and vfs drivers.
#ignore_chown_errors = "false"

# Inodes is used to set a maximum inodes of the container image.
# inodes = ""

# Path to an helper program to use for mounting the file system instead of mounting it
# directly.
#mount_program = "/usr/bin/fuse-overlayfs"

# mountopt specifies comma separated list of extra mount options
mountopt = "nodev,metacopy=on"

# Set to skip a PRIVATE bind mount on the storage home directory.
# skip_mount_home = "false"

# Size is used to set a maximum size of the container image.
# size = ""

# ForceMask specifies the permissions mask that is used for new files and
# directories.
#
# The values "shared" and "private" are accepted.
# Octal permission masks are also accepted.
#
#  "": No value specified.
#     All files/directories, get set with the permissions identified within the
#     image.
#  "private": it is equivalent to 0700.
#     All files/directories get set with 0700 permissions.  The owner has rwx
#     access to the files. No other users on the system can access the files.
#     This setting could be used with networked based homedirs.
#  "shared": it is equivalent to 0755.
#     The owner has rwx access to the files and everyone else can read, access
#     and execute them. This setting is useful for sharing containers storage
#     with other users.  For instance have a storage owned by root but shared
#     to rootless users as an additional store.
#     NOTE:  All files within the image are made readable and executable by any
#     user on the system. Even /etc/shadow within your image is now readable by
#     any user.
#
#   OCTAL: Users can experiment with other OCTAL Permissions.
#
#  Note: The force_mask Flag is an experimental feature, it could change in the
#  future.  When "force_mask" is set the original permission mask is stored in
#  the "user.containers.override_stat" xattr and the "mount_program" option must
#  be specified. Mount programs like "/usr/bin/fuse-overlayfs" present the
#  extended attribute permissions to processes within containers rather then the
#  "force_mask"  permissions.
#
# force_mask = ""

[storage.options.thinpool]
# Storage Options for thinpool

# autoextend_percent determines the amount by which pool needs to be
# grown. This is specified in terms of % of pool size. So a value of 20 means
# that when threshold is hit, pool will be grown by 20% of existing
# pool size.
# autoextend_percent = "20"

# autoextend_threshold determines the pool extension threshold in terms
# of percentage of pool size. For example, if threshold is 60, that means when
# pool is 60% full, threshold has been hit.
# autoextend_threshold = "80"

# basesize specifies the size to use when creating the base device, which
# limits the size of images and containers.
# basesize = "10G"

# blocksize specifies a custom blocksize to use for the thin pool.
# blocksize="64k"

# directlvm_device specifies a custom block storage device to use for the
# thin pool. Required if you setup devicemapper.
# directlvm_device = ""

# directlvm_device_force wipes device even if device already has a filesystem.
# directlvm_device_force = "True"

# fs specifies the filesystem type to use for the base device.
# fs="xfs"

# log_level sets the log level of devicemapper.
# 0: LogLevelSuppress 0 (Default)
# 2: LogLevelFatal
# 3: LogLevelErr
# 4: LogLevelWarn
# 5: LogLevelNotice
# 6: LogLevelInfo
# 7: LogLevelDebug
# log_level = "7"

# min_free_space specifies the min free space percent in a thin pool require for
# new device creation to succeed. Valid values are from 0% - 99%.
# Value 0% disables
# min_free_space = "10%"

# mkfsarg specifies extra mkfs arguments to be used when creating the base
# device.
# mkfsarg = ""

# metadata_size is used to set the `pvcreate --metadatasize` options when
# creating thin devices. Default is 128k
# metadata_size = ""

# Size is used to set a maximum size of the container image.
# size = ""

# use_deferred_removal marks devicemapper block device for deferred removal.
# If the thinpool is in use when the driver attempts to remove it, the driver
# tells the kernel to remove it as soon as possible. Note this does not free
# up the disk space, use deferred deletion to fully remove the thinpool.
# use_deferred_removal = "True"

# use_deferred_deletion marks thinpool device for deferred deletion.
# If the device is busy when the driver attempts to delete it, the driver
# will attempt to delete device every 30 seconds until successful.
# If the program using the driver exits, the driver will continue attempting
# to cleanup the next time the driver is used. Deferred deletion permanently
# deletes the device and all data stored in device will be lost.
# use_deferred_deletion = "True"

# xfs_nospace_max_retries specifies the maximum number of retries XFS should
# attempt to complete IO when ENOSPC (no space) error is returned by
# underlying storage device.
# xfs_nospace_max_retries = "0"

Aug 25 '22 17:08 lawsonjl-ornl

Thanks for reaching out, @lawsonjl-ornl!

@ashley-cui PTAL

Aug 26 '22 07:08 vrothberg

I think the reason is that secrets are mounted from host filesystem for a particular RUN step and unmounted after that so that they are not generally available across the rootfs of the working container and any other method would require us to copy or add secret into the rootfs which does not looks safe.

If i recall correctly additional --cap-add CAP_SYS_ADMIN is a must we want to perform remount inside the nested container. ( container-inside-container scenario )

From thinking quickly I don't think this can be fixed since that is how secrets are supposed to be used inside a build container but I'll wait for others to comment or I'll comment back here if i can think of a solution for this.

Aug 26 '22 08:08 flouthoc

So when I first wrote this issue, I was thinking that secrets provided via --secret src=... worked when running inside a podman container, but it actually appears that case fails too:

# On the host (not running as root)
podman run --rm -it --device=/dev/fuse quay.io/buildah/stable:v1.26.4

# Within the container
echo 'abc123' > /tmp/ENV_SECRET
cat <<'EOF'  | buildah bud --secret id=ENV_SECRET,src=/tmp/ENV_SECRET -f - .
FROM alpine
RUN --mount=type=secret,id=ENV_SECRET echo "SECRET: $(cat /run/secrets/ENV_SECRET)"
EOF

STEP 1/2: FROM alpine
Resolved "alpine" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf)
Trying to pull docker.io/library/alpine:latest...
Getting image source signatures
Copying blob 213ec9aee27d done
Copying config 9c6f072447 done
Writing manifest to image destination
Storing signatures
STEP 2/2: RUN --mount=type=secret,id=ENV_SECRET echo "SECRET: $(cat /run/secrets/ENV_SECRET)"
error running subprocess: error remounting "/var/tmp/buildah1144654328/mnt/rootfs/run/secrets/ENV_SECRET" in mount namespace with expected flags: operation not permitted
error building at STEP "RUN --mount=type=secret,id=ENV_SECRET echo "SECRET: $(cat /run/secrets/ENV_SECRET)"": exit status 1

If we instead run buildah in a rootless docker container, it does work, as long as we use the seccomp.json from podman instead of docker's defaults:

# On the host (not running as root, with a rootless docker daemon active)
docker run --rm -it --device=/dev/fuse --security-opt seccomp=podman-seccomp.json quay.io/buildah/stable:v1.26.4

# Within the container
echo 'abc123' > /tmp/ENV_SECRET
cat <<'EOF'  | buildah bud --secret id=ENV_SECRET,src=/tmp/ENV_SECRET -f - .
FROM alpine
RUN --mount=type=secret,id=ENV_SECRET echo "SECRET: $(cat /run/secrets/ENV_SECRET)"
EOF

STEP 1/2: FROM alpine
Resolved "alpine" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf)
Trying to pull docker.io/library/alpine:latest...
Getting image source signatures
Copying blob 213ec9aee27d done
Copying config 9c6f072447 done
Writing manifest to image destination
Storing signatures
STEP 2/2: RUN --mount=type=secret,id=ENV_SECRET echo "SECRET: $(cat /run/secrets/ENV_SECRET)"
SECRET: abc123
COMMIT
Getting image source signatures
Copying blob 994393dc58e7 skipped: already exists
Copying blob c5069376b2da done
Copying config 0cd5e430d2 done
Writing manifest to image destination
Storing signatures
--> 0cd5e430d2c
0cd5e430d2c2075e5971a0a86726b3251cb86e7196add2d2f71ba7bf8ad6c1c6

With --secret type=env it still fails, however:

# On the host (not running as root, with a rootless docker daemon active)
docker run --rm -it --device=/dev/fuse --security-opt seccomp=podman-seccomp.json quay.io/buildah/stable:v1.26.4

# Within the container
export ENV_SECRET='abc123'
cat <<'EOF'  | buildah bud --secret id=ENV_SECRET,type=env -f - .
FROM alpine
RUN --mount=type=secret,id=ENV_SECRET echo "SECRET: $(cat /run/secrets/ENV_SECRET)"
EOF

STEP 1/2: FROM alpine
Resolved "alpine" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf)
Trying to pull docker.io/library/alpine:latest...
Getting image source signatures
Copying blob 213ec9aee27d done
Copying config 9c6f072447 done
Writing manifest to image destination
Storing signatures
STEP 2/2: RUN --mount=type=secret,id=ENV_SECRET echo "SECRET: $(cat /run/secrets/ENV_SECRET)"
error running subprocess: error remounting "/var/tmp/buildah776105798/mnt/rootfs/run/secrets/ENV_SECRET" in mount namespace with expected flags: operation not permitted
error building at STEP "RUN --mount=type=secret,id=ENV_SECRET echo "SECRET: $(cat /run/secrets/ENV_SECRET)"": exit status 1

I ran those two examples on a different system, where I have docker installed instead of podman, so here's its details:

$ uname -a
Linux (hostname redacted) 4.18.0-372.19.1.el8_6.x86_64 #1 SMP Mon Jul 18 11:14:02 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/os-release
NAME="Red Hat Enterprise Linux"
VERSION="8.6 (Ootpa)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="8.6"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.6 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8::baseos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://access.redhat.com/documentation/red_hat_enterprise_linux/8/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_BUGZILLA_PRODUCT_VERSION=8.6
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.6"

$ docker system info
Client:
 Context:    rootless
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Docker Buildx (Docker Inc., v0.8.2-docker)
  scan: Docker Scan (Docker Inc., v0.17.0)

Server:
 Containers: 3
  Running: 2
  Paused: 0
  Stopped: 1
 Images: 12
 Server Version: 20.10.17
 Storage Driver: fuse-overlayfs
 Logging Driver: json-file
 Cgroup Driver: none
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runtime.v1.linux runc io.containerd.runc.v2
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1
 runc version: v1.1.3-0-g6724737
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
  rootless
 Kernel Version: 4.18.0-372.19.1.el8_6.x86_64
 Operating System: Red Hat Enterprise Linux 8.6 (Ootpa)
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 125.5GiB
 Name: (hostname redacted)
 ID: DALB:S4IE:W44O:UGZX:SNPB:VH5L:M4OK:JOVX:7EL4:2PAO:W3G4:4O2W
 Docker Root Dir: /home/docker-rootless/.local/share/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

I tried, but I wasn't able to get docker running under a (non-privileged, rootless) podman or docker container in order to test whether their --secret support works in that type of environment. And testing with --privileged or --cap-add CAP_SYS_ADMIN would be pointless, of course.

So this is what I've observed in terms of --secret compatibility:

Runtime environment	file	env
buildah in podman	No	No
buildah in docker	Yes*	No
docker in podman	?	?
docker in docker	?	?

*with podman's seccomp.json

Aug 26 '22 16:08 lawsonjl-ornl

A friendly reminder that this issue had no activity for 30 days.

Sep 26 '22 00:09 github-actions[bot]

@giuseppe PTAL

When running the buidlah container it might work better if you run it with the build user (--user build).

Sep 26 '22 17:09 rhatdan

A friendly reminder that this issue had no activity for 30 days.

Oct 27 '22 00:10 github-actions[bot]

@giuseppe did you ever get a chance to check this out?

Oct 27 '22 17:10 rhatdan

opened a PR: https://github.com/containers/buildah/pull/4388

Oct 28 '22 12:10 giuseppe

buildah buildah copied to clipboard

Environment --secrets broken when running buildah inside a container

buildah
buildah copied to clipboard