kube-spawn icon indicating copy to clipboard operation
kube-spawn copied to clipboard

No space left on device /var/lib/machines (again)

Open alban opened this issue 6 years ago • 7 comments

To Reproduce:

  • Install Fedora 28 from https://cloud.fedoraproject.org/ (GP2 image) on AWS:
    • m4.large
    • Disk: at least 50GiB
    • ssh: ssh -i ~/.ssh/$KEY fedora@$IP
  • Start a kube-spawn Kubernetes cluster on the AWS EC2 instance:
export KUBERNETES_VERSION=v1.9.9 # or other version
export KUBE_SPAWN_VERSION=master
sudo setenforce 0
sudo dnf install -y btrfs-progs git go iptables libselinux-utils polkit qemu-img systemd-container make docker
mkdir go
export GOPATH=$HOME/go
curl -fsSL -O https://github.com/containernetworking/plugins/releases/download/v0.6.0/cni-plugins-amd64-v0.6.0.tgz
sudo mkdir -p /opt/cni/bin
sudo tar -C /opt/cni/bin -xvf cni-plugins-amd64-v0.6.0.tgz
mkdir -p $GOPATH/src/github.com/kinvolk
cd $GOPATH/src/github.com/kinvolk
git clone https://github.com/kinvolk/kube-spawn.git
cd kube-spawn/
git checkout $KUBE_SPAWN_VERSION
make DOCKERIZED=n
sudo make install
sudo -E kube-spawn create --kubernetes-version $KUBERNETES_VERSION
sudo -E kube-spawn start --nodes=3

And I get the error:

Got 17% of https://alpha.release.flatcar-linux.net/amd64-usr/current/flatcar_developer_container.bin.bz2. 1min 31s left at 4.9M/s.
Failed to write file: Success
Failed to write file: Success
Failed to write file: No space left on device
Failed to retrieve image file. (Wrong URL?)
Exiting.
Failed to start cluster: error running machinectl pull-raw: exit status 1

/var/lib/machines is full.

Expected outcome

  • [ ] troubleshooting.md should explain how to increase the size of /var/lib/machines.
  • [ ] Issue https://github.com/kinvolk/kube-spawn/issues/66 and PR https://github.com/kinvolk/kube-spawn/pull/70 were closed/merged but I still have the issue.

alban avatar Jun 30 '18 13:06 alban

Manually running the workaround suggested in #70 seems to work:

sudo umount /var/lib/machines
sudo qemu-img resize -f raw /var/lib/machines.raw $((10*1024*1024*1024))
sudo mount -t btrfs -o loop /var/lib/machines.raw /var/lib/machines
sudo btrfs filesystem resize max /var/lib/machines
sudo btrfs quota disable /var/lib/machines

alban avatar Jun 30 '18 14:06 alban

It seems I cannot run the workaround on a fresh install of Fedora because /var/lib/machines does not exist yet. I have to first run into the error, then apply the workaround, and then try kube-spawn again. I guess that's why #70 didn't work.

$ sudo umount /var/lib/machines
umount: /var/lib/machines: not mounted.
$ sudo qemu-img resize -f raw /var/lib/machines.raw $((10*1024*1024*1024))
qemu-img: Could not open '/var/lib/machines.raw': Could not open '/var/lib/machines.raw': No such file or directory
$ sudo mount -t btrfs -o loop /var/lib/machines.raw /var/lib/machines
mount: /var/lib/machines: failed to setup loop device for /var/lib/machines.raw.
$ sudo btrfs filesystem resize max /var/lib/machines
ERROR: not a btrfs filesystem: /var/lib/machines
$ sudo btrfs quota disable /var/lib/machines
ERROR: not a btrfs filesystem: /var/lib/machines

alban avatar Jun 30 '18 14:06 alban

@alban can you say what steps you took on a fresh Fedora system, would like to add it to the docs.

schu avatar Jul 09 '18 14:07 schu

@schu Do you mean the steps to work around the issue? See the steps in https://github.com/kinvolk/kube-spawn/issues/282, search for First attempt to use kube-spawn and Workaround for "no space left on device"

alban avatar Jul 10 '18 19:07 alban

You can run e.g. 'sudo machinectl set-limit 20G' before you launch the first machine, this will set the max limit prior to it creating the btrfs.

donbowman avatar Jul 23 '18 12:07 donbowman

@donbowman Yes, we can document that approach.

Anyway to fix this issue, we need to merge https://github.com/kinvolk/kube-spawn/pull/283, which looks good to me. I'm thinking about merging it tomorrow, if there's no objection. Documentation is still in progress, so I can make a follow-up PR to address the documentation issue.

dongsupark avatar Jul 23 '18 17:07 dongsupark

Hmm, I didn't mean to close it. Will reopen it, as there's a documentation issue left.

dongsupark avatar Jul 24 '18 16:07 dongsupark