"no space left on device" due to high inode usage
On >v0.4.1 (currently v0.5.0 and v.0.5.2) , it seems like using sysbox causes a larger number of inodes to be consumed which can pretty easily use up the systems entire allotment so that no more containers can start.
Unfortunately I wasn't able to reproduce this locally, but here's a small terraform config to stand up a machine that will reproduce the issue. Just be sure to supply a public key you can use to SSH in.
resource "aws_instance" "instance" {
ami = "ami-09dd2e08d601bff67"
instance_type = "t3.small"
vpc_security_group_ids = [aws_security_group.security_group.id]
key_name = aws_key_pair.key_pair.key_name
}
resource "aws_security_group" "security_group" {
name = "sg_foo"
ingress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
resource "aws_key_pair" "key_pair" {
key_name = "kp_foo"
public_key = "<your key here>"
}
Then SSH into the machine and run the following:
export SYSBOX_VERSION=0.5.2 ; \
sudo apt-get update \
&& sudo apt-get -y install ca-certificates curl gnupg lsb-release \
&& sudo mkdir -p /etc/apt/keyrings \
&& curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg \
&& echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null \
&& sudo apt-get update \
&& sudo apt-get -y install docker-ce docker-ce-cli containerd.io docker-compose-plugin \
&& wget https://downloads.nestybox.com/sysbox/releases/v${SYSBOX_VERSION}/sysbox-ce_${SYSBOX_VERSION}-0.linux_amd64.deb \
&& sudo apt-get install -y jq \
&& sudo apt-get install -y ./sysbox-ce_${SYSBOX_VERSION}-0.linux_amd64.deb \
&& sudo apt-get install -y linux-headers-$(uname -r)
To reproduce the actual error, you'll have to run a few containers without removing them:
for i in $(seq 1 50); do sudo docker run -it --runtime=sysbox-runc node:latest echo $i || break; done
inode usage should spike right away if the issue is reproducing which can be seen with df -i. It usually runs out of inodes and fails before iteration 30 (27 specifically from my experience) with an error like so:
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: container_linux.go:292: failed to chown rootfs clone caused: failed to invoke ChownClonedRootfs via grpc: rpc error: code = Unknown desc = failed to chown cloned rootfs bottom mount at /var/lib/sysbox/rootfs/194c0a6559cab0de061561c4275408fb7331a5855b619f025f7d5cc6ccb99c65/bottom/merged by offset 165536, 165536: chown /var/lib/sysbox/rootfs/194c0a6559cab0de061561c4275408fb7331a5855b619f025f7d5cc6ccb99c65/bottom/merged/usr/share/ca-certificates/mozilla/CA_Disig_Root_R2.crt to 165536:165536 failed: no space left on device: unknown.
A little more information about the system:
$ uname -a
Linux ip-<redacted> 5.4.0-1009-aws #9-Ubuntu SMP Sun Apr 12 19:46:01 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04 LTS
Release: 20.04
Codename: focal
Installing shiftfs via the following seems to fix the issue:
sudo apt-get update \
&& sudo apt-get install -y make dkms git wget \
&& git clone -b k5.10 https://github.com/toby63/shiftfs-dkms.git shiftfs-k510 \
&& cd shiftfs-k510 \
&& git checkout k5.4 \
&& ./update1 \
&& sudo make -f Makefile.dkms \
&& modinfo shiftfs
The containers seem to start and exit much quicker as well.
I'd like to also test this on a newer kernel and will as soon as I'm able.