talos
talos copied to clipboard
talosctl: installing rook on docker provisioned cluster corrupts host's LUKS partition
Bug Report
Description
- I provisioned a talos cluster with docker on Fedora 35:
talosctl cluster create --wait --extra-disks 1 --workers 3
- I followed this guide and installed Rook.
After I rebooted my machine it didn't boot anymore. All my partitions were intact except the LUKS partition, which was reformatted as a cephBluestore
.
I didn't reproduce the issue since it would require going through the whole setup of my machine again. It's possible that I did something else that caused the problem.
- talosctl version
Client:
Tag: v1.0.1
SHA: 65d872ed
Built:
Go version: go1.17.8
OS/Arch: linux/amd64
- Platform: Fedora 35
The root cause is that talosctl cluster create
does the equivalent of docker run --privileged
, and that exposes host block devices to the container, which in turn exposes them to pods running on Kubernetes in Talos inside the container. So Rook can detect and mistakenly try to use a host block device.
This feels like a bug to me, and we should fix it. The problem is that I don't see equivalent of --privileged
via other options in the Docker API which would allow us to disable device passthrough.
@smira Why wrap the Docker CLI tool in the first place? It's opaque and pretty dangerous, as it turns out here. Someone used to running Linux containers shouldn't be discouraged by having to issue a lengthy command line. In fact, they could use the compose
(Docker Engine, Podman, nerdctl) and/or kube play
(Podman) subcommands and you could define a sample spec in YAML in the docs, to keep it brief.
@smira
This feels like a bug to me, and we should fix it. The problem is that I don't see equivalent of
--privileged
via other options in the Docker API which would allow us to disable device passthrough.
Do I understand correctly that you want all of --privileged
, but disable having host /dev/
or more specifically the block devices mounted inside the container? Would https://docs.docker.com/reference/cli/docker/container/run/#device-cgroup-rule help you restrict that?
But see also: https://github.com/siderolabs/talos/issues/4385#issuecomment-2058841449