buildkit icon indicating copy to clipboard operation
buildkit copied to clipboard

Failure to remove container after successful command run (runc regression?)

Open andresdelfino opened this issue 6 months ago • 10 comments

Contributing guidelines and issue reporting guide

Well-formed report checklist

  • [x] I have found a bug that the documentation does not mention anything about my problem
  • [x] I have found a bug that there are no open or closed issues that are related to my problem
  • [x] I have provided version/information about my environment and done my best to provide a reproducer

Description of bug

Bug description

Failure to remove container after successful command

I'm running a GitLab job that starts buildkitd using rootlesskit as a service and runs buildctl in another container.

This works on 0.17.3, but fails on 0.18.0.

Reproduction

Parameters to buildkitd: --oci-worker-no-process-sandbox

Dockerfile to reproduce the issue:

FROM debian:bookworm

RUN apt-get update

RUN apt-get install -y gpg

RUN mkdir --mode 700 /root/.gnupg

COPY <<EOF /key-params.txt
Key-Type: RSA
Key-Length: 4096
Subkey-Type: RSA
Subkey-Length: 4096
Name-Real: Your Name
Name-Email: [email protected]
Expire-Date: 2y
Passphrase: your-secure-passphrase
%commit
EOF

RUN gpg --batch --gen-key /key-params.txt

Error:

error: failed to solve: process "/bin/sh -c gpg --batch --gen-key /key-params.txt" did not complete successfully: buildkit-runc did not terminate successfully: exit status 1: unable to destroy container: unable to remove container's cgroup: rmdir /sys/fs/cgroup/ppispbbsap6h2q62rrceo2gtl: device or resource busy`

Version information

buildctl github.com/moby/buildkit v0.18.0 95d190ef4f18b57c717eaad703b67cb2be781ebb
buildkitd github.com/moby/buildkit v0.18.0 95d190ef4f18b57c717eaad703b67cb2be781ebb
Docker version 28.3.0, build 38b7060
Host: Ubuntu 22.04.3 LTS

andresdelfino avatar Jun 25 '25 14:06 andresdelfino

If this is a regression, can you complete https://github.com/moby/buildkit/blob/master/.github/issue_reporting_guide.md#regressions

tonistiigi avatar Jul 03 '25 00:07 tonistiigi

The last release that works is 0.17.3.

Curiously, https://github.com/moby/buildkit/releases/tag/v0.18.0 says:

Runc container runtime has been updated to v1.2.2 https://github.com/moby/buildkit/pull/5532

And https://github.com/opencontainers/runc/releases/tag/v1.2.2 says:

Fixed the failure of runc delete on a rootless container with no dedicated cgroup on a system with read-only /sys/fs/cgroup mount. This is a regression in runc 1.2.0, causing a failure when using rootless buildkit. (https://github.com/opencontainers/runc/issues/4518, https://github.com/opencontainers/runc/pull/4531)

Perhaps the fix introduced the issue?

andresdelfino avatar Jul 22 '25 20:07 andresdelfino

Note that 0.17.3 uses runc 1.1.15, so taking a look at https://github.com/opencontainers/runc/pull/4531 is not enough.

andresdelfino avatar Jul 23 '25 02:07 andresdelfino

If this is a regression, can you complete https://github.com/moby/buildkit/blob/master/.github/issue_reporting_guide.md#regressions

Commit 4b36562e0 is the first one with the issue. Commit 13a1efb8f, the previous one, works fine.

andresdelfino avatar Jul 26 '25 19:07 andresdelfino

Wrote this to test, if someone finds it useful:

#!/bin/bash
set -eo pipefail

CLIENT_NAME=buildctl
DAEMON_NAME=buildkitd
DAEMON_PORT=2375
IMAGE=docker.io/moby/buildkit:v0.17.3-rootless
NETWORK=buildkitd-bug
SRC_PATH=/src

docker container rm -f $DAEMON_NAME $CLIENT_NAME
docker network rm -f $NETWORK

docker image pull $IMAGE

docker network create $NETWORK

docker run \
    --name $DAEMON_NAME \
    --network $NETWORK \
    --privileged \
    --detach=true \
    --expose $DAEMON_PORT \
    $IMAGE \
        --addr tcp://0.0.0.0:$DAEMON_PORT \
        --oci-worker-no-process-sandbox

set +e
docker run \
    --name $CLIENT_NAME \
    --tty \
    --network $NETWORK \
    --interactive \
    --mount type=bind,source=/home/adelfino/buildkitd_test,destination=$SRC_PATH \
    --entrypoint /usr/bin/buildctl \
    $IMAGE \
    --addr tcp://$DAEMON_NAME:$DAEMON_PORT \
    build \
        --output type=image,name=perrito \
        --frontend dockerfile.v0 \
        --local context=$SRC_PATH
status=$?
set -e

docker container rm -f $DAEMON_NAME $CLIENT_NAME
docker network rm $NETWORK

exit $status

andresdelfino avatar Jul 26 '25 19:07 andresdelfino

The first release of runc to make this fail is v1.2.0-rc.1.

andresdelfino avatar Jul 26 '25 21:07 andresdelfino

Perhaps this is related? https://github.com/opencontainers/runc/pull/3825

andresdelfino avatar Jul 26 '25 21:07 andresdelfino

v0.18.0-rootless works fine when using --security-opt seccomp=unconfined, --security-opt apparmor=unconfined and --security-opt systempaths=unconfined instead of --privileged and --oci-worker-no-process-sandbox:

#!/bin/bash
set -eo pipefail

CLIENT_NAME=buildctl
DAEMON_NAME=buildkitd
DAEMON_PORT=2375
IMAGE=docker.io/moby/buildkit:v0.18.0-rootless
NETWORK=buildkitd-bug
SRC_PATH=/src

docker container rm -f $DAEMON_NAME $CLIENT_NAME
docker network rm -f $NETWORK

docker image pull $IMAGE

docker network create $NETWORK

docker run \
    --name $DAEMON_NAME \
    --network $NETWORK \
    --security-opt seccomp=unconfined \
    --security-opt apparmor=unconfined \
    --security-opt systempaths=unconfined \
    --detach=true \
    --expose $DAEMON_PORT \
    $IMAGE \
        --addr tcp://0.0.0.0:$DAEMON_PORT

set +e
docker run \
    --name $CLIENT_NAME \
    --tty \
    --network $NETWORK \
    --interactive \
    --mount type=bind,source=/home/adelfino/buildkitd_test,destination=$SRC_PATH \
    --entrypoint /usr/bin/buildctl \
    $IMAGE \
    --addr tcp://$DAEMON_NAME:$DAEMON_PORT \
    build \
        --output type=image,name=perrito \
        --frontend dockerfile.v0 \
        --local context=$SRC_PATH \
        --local dockerfile=$SRC_PATH
status=$?
set -e

docker container rm -f $DAEMON_NAME $CLIENT_NAME
docker network rm $NETWORK

exit $status

andresdelfino avatar Jul 26 '25 23:07 andresdelfino

cc @AkihiroSuda

thaJeztah avatar Sep 16 '25 18:09 thaJeztah

Is there a known workaround for this issue? I’m seeing the same behavior in a similar environment. Is rolling back to 0.17.x currently the solution?

NilsKrattinger avatar Dec 10 '25 10:12 NilsKrattinger