sysbox icon indicating copy to clipboard operation
sysbox copied to clipboard

"no space left on device" due to high inode usage

Open ScottG489 opened this issue 3 years ago • 0 comments

On >v0.4.1 (currently v0.5.0 and v.0.5.2) , it seems like using sysbox causes a larger number of inodes to be consumed which can pretty easily use up the systems entire allotment so that no more containers can start.

Unfortunately I wasn't able to reproduce this locally, but here's a small terraform config to stand up a machine that will reproduce the issue. Just be sure to supply a public key you can use to SSH in.

resource "aws_instance" "instance" {
  ami           = "ami-09dd2e08d601bff67"
  instance_type = "t3.small"
  vpc_security_group_ids = [aws_security_group.security_group.id]
  key_name = aws_key_pair.key_pair.key_name
}

resource "aws_security_group" "security_group" {
  name = "sg_foo"
  ingress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

resource "aws_key_pair" "key_pair" {
  key_name   = "kp_foo"
  public_key = "<your key here>"
}

Then SSH into the machine and run the following:

export SYSBOX_VERSION=0.5.2 ; \
sudo apt-get update \
  && sudo apt-get -y install ca-certificates curl gnupg lsb-release \
  && sudo mkdir -p /etc/apt/keyrings \
  && curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg \
  && echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null \
  && sudo apt-get update \
  && sudo apt-get -y install docker-ce docker-ce-cli containerd.io docker-compose-plugin \
  && wget https://downloads.nestybox.com/sysbox/releases/v${SYSBOX_VERSION}/sysbox-ce_${SYSBOX_VERSION}-0.linux_amd64.deb \
  && sudo apt-get install -y jq \
  && sudo apt-get install -y ./sysbox-ce_${SYSBOX_VERSION}-0.linux_amd64.deb \
  && sudo apt-get install -y linux-headers-$(uname -r)

To reproduce the actual error, you'll have to run a few containers without removing them:

for i in $(seq 1 50); do sudo docker run -it --runtime=sysbox-runc node:latest echo $i || break; done

inode usage should spike right away if the issue is reproducing which can be seen with df -i. It usually runs out of inodes and fails before iteration 30 (27 specifically from my experience) with an error like so:

docker: Error response from daemon: failed to create shim task: OCI runtime create failed: container_linux.go:292: failed to chown rootfs clone caused: failed to invoke ChownClonedRootfs via grpc: rpc error: code = Unknown desc = failed to chown cloned rootfs bottom mount at /var/lib/sysbox/rootfs/194c0a6559cab0de061561c4275408fb7331a5855b619f025f7d5cc6ccb99c65/bottom/merged by offset 165536, 165536: chown /var/lib/sysbox/rootfs/194c0a6559cab0de061561c4275408fb7331a5855b619f025f7d5cc6ccb99c65/bottom/merged/usr/share/ca-certificates/mozilla/CA_Disig_Root_R2.crt to 165536:165536 failed: no space left on device: unknown.

A little more information about the system:

$ uname -a
Linux ip-<redacted> 5.4.0-1009-aws #9-Ubuntu SMP Sun Apr 12 19:46:01 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04 LTS
Release:        20.04
Codename:       focal

Installing shiftfs via the following seems to fix the issue:

sudo apt-get update \
  && sudo apt-get install -y make dkms git wget \
  && git clone -b k5.10 https://github.com/toby63/shiftfs-dkms.git shiftfs-k510 \
  && cd shiftfs-k510 \
  && git checkout k5.4 \
  && ./update1 \
  && sudo make -f Makefile.dkms \
  && modinfo shiftfs

The containers seem to start and exit much quicker as well.

I'd like to also test this on a newer kernel and will as soon as I'm able.

ScottG489 avatar Jul 08 '22 21:07 ScottG489