sysbox icon indicating copy to clipboard operation
sysbox copied to clipboard

[Bug] Unable to run container with --runtime=sysbox-runc when Docker data root is on an LVM

Open cprevosteau opened this issue 4 years ago • 4 comments

After installing Sysbox on Ubuntu focal either from package release or from source code, I cannot run a container with sysbox-runc, I always have the same error:

$ docker run --runtime=sysbox-runc hello-world
docker: Error response from daemon: OCI runtime create failed: container_linux.go:364: starting container process caused "process_linux.go:342: getting the final child's pid from pipe caused \"EOF\"": unknown.
ERRO[0000] error waiting for container: context canceled

whereas with runc it works.

I tried this without success.

$ uname -a
Linux charles 5.4.0-58-generic #64-Ubuntu SMP Wed Dec 9 08:16:25 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.4.2-docker)

Server:
 Containers: 2
  Running: 0
  Paused: 0
  Stopped: 2
 Images: 206
 Server Version: 20.10.0
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc sysbox-runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc version: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.4.0-58-generic
 Operating System: Ubuntu 20.04.1 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 12
 Total Memory: 31.01GiB
 Name: charles
 ID: LREV:LW7X:THPL:CJY4:W7V2:3PDY:OQ4R:EPPX:QA5F:3BTJ:JKTI:7C6W
 Docker Root Dir: /home/clement/encrypted/system/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Default Address Pools:
   Base: 172.25.0.0/16, Size: 24

WARNING: No swap limit support
WARNING: No blkio weight support
WARNING: No blkio weight_device support
$ sudo cat /etc/docker/daemon.json 
{
  "data-root": "/home/clement/encrypted/system/docker",
  "bip": "172.20.0.1/16",
  "runtimes": {
       "sysbox-runc": {
          "path": "/usr/local/sbin/sysbox-runc"
       }
   },
  "default-address-pools": [
    {
      "base": "172.25.0.0/16",
      "size": 24
    }
  ]
}

cprevosteau avatar Dec 18 '20 19:12 cprevosteau

Hi @cprevosteau , thanks for giving Sysbox a shot!

  1. Can you double check the sysbox-mgr and sysbox-fs daemons are running? (e.g., ps -fu root | grep sysbox)

  2. Can you double check the shiftfs module is present in the kernel? (lsmod | grep shiftfs)

For example, in my host these result in:

cesar@focal:~/nestybox/sysbox$ ps -fu root | grep sysbox
root     3137321       1  0 Dec16 pts/0    00:00:08 sysbox-mgr --log /var/log/sysbox-mgr.log
root     3137339       1  0 Dec16 pts/0    00:09:12 sysbox-fs --log /var/log/sysbox-fs.log

cesar@focal:~/nestybox/sysbox$ lsmod | grep shiftfs
shiftfs                28672  0

And thus docker run --runtime=sysbox-runc hello-world works without problem:

cesar@focal:~/nestybox/sysbox$ docker run --runtime=sysbox-runc hello-world
Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/

Thanks!

ctalledo avatar Dec 18 '20 19:12 ctalledo

Hi, thx for the quick answer ! Unfortunately nothing there : 1)

$ ps -fu root | grep sysbox
root      177555    1690  0 19:36 pts/5    00:00:00 sysbox-mgr --log /var/log/sysbox-mgr.log
root      177580    1690  0 19:36 pts/5    00:00:06 sysbox-fs --log /var/log/sysbox-fs.log
$ lsmod | grep shiftfs
shiftfs                28672  0

cprevosteau avatar Dec 18 '20 19:12 cprevosteau

Would you mind joining the sysbox slack channel? It's easier to debug that way. We can post the resolution back into this GitHub issue once we find the problem.

The link to the slack channel is here:

Link is here at the bottom of this page: https://github.com/nestybox/sysbox#contact

Thanks!

ctalledo avatar Dec 18 '20 19:12 ctalledo

After debugging this with @cprevosteau (thanks!), we found out that the problem was that the Docker data-root (which is typically at /var/lib/docker on ext4) was configured to a different directory located on top of an LVM.

As far as I know there is nothing wrong with having the Docker data-root on top of an LVM, but for some reason (yet to be investigated) Sysbox is failing when creating the container in this case.

More specifically, when the docker data-root is on top of an LVM, the container's root filesystem is also on top of that LVM, and this causes sysbox-runc to fail very early when creating the container's init process. Unfortunately sysbox-runc is not providing much info on why the failure occurs. We only see getting the final child's pid from pipe caused , meaning that the container's init process died for some reason very early after it was created.

Interestingly, we also noticed that Docker itself does not like the data-root on an LVM when configured with docker userns-remap. That is, if the Docker data-root is on top of an LVM and "userns-remap": "<some-user>" is added to the /etc/docker/daemon.json file, restarting Docker fails. If the data-root is moved to an ext4 physical partition, restarting Docker works without problem. Thus, there is some incompatibility in Docker itself between userns-remap and LVM.

ctalledo avatar Dec 19 '20 20:12 ctalledo