kubernetes-zfs-provisioner icon indicating copy to clipboard operation
kubernetes-zfs-provisioner copied to clipboard

Containerized SSH-less provisioner

Open jp39 opened this issue 5 months ago • 7 comments

Hi,

The documentation states:

Making a container image and creating ZFS datasets from a container is not exactly easy, as ZFS runs in kernel. While it's possible to pass /dev/zfs to a container so it can create and destroy datasets within the container, sharing the volume with NFS does not work.

Setting sharenfs property to anything other than off invokes exportfs(8), that requires also running the NFS Server to reload its exports. Which is not the case in a container (see zfs(8)).

But most importantly: Mounting /dev/zfs inside the provisioner container would mean that the datasets will only be created on the same host as the container currently runs.

So, in order to "break out" of the container the zfs calls are wrapped and redirected to another host over SSH. This requires SSH private keys to be mounted in the container for a SSH user with sufficient permissions to run zfs commands on the target host.

I spent some time working on a small proof of concept that shows it is possible to create ZFS dataset from within a container and have the volumes shared with NFS by the container. Also, the volume mounts are visible by both the host and the container, making it shareable using HostPath.

I'm using this Dockerfile:

FROM docker.io/library/alpine:3.20 as runtime

ENTRYPOINT ["/entrypoint.sh"]

RUN apk add bash zfs nfs-utils

COPY kubernetes-zfs-provisioner /usr/bin/
COPY entrypoint.sh /

With this entrypoint.sh:

#!/bin/sh

rpcbind
rpc.statd --no-notify --port 32765 --outgoing-port 32766
rpc.mountd --port 32767
rpc.idmapd
rpc.nfsd --tcp --udp --port 2049 8

exec /usr/bin/kubernetes-zfs-provisioner

The secret sauce is to use mountPropagation: Bidirectional for the dataset volume mount, so each dataset mounted by the container is also visible in the host and vice versa:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: zfs-provisioner
  namespace: zfs-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: zfs-provisioner
  template:
    metadata:
      labels:
        app.kubernetes.io/name: zfs-provisioner
      namespace: zfs-system
    spec:
      serviceAccountName: zfs-provisioner
      containers:
      - name: provisionner
        image: jp39/zfs:latest
        volumeMounts:
        - name: dev-zfs
          mountPath: /dev/zfs
        - name: dataset
          mountPath: /tank/kubernetes
          mountPropagation: Bidirectional
        securityContext:
          privileged: true
          procMount: Unmasked
        ports:
        - containerPort: 2049
          protocol: TCP
        - containerPort: 111
          protocol: UDP
        - containerPort: 32765
          protocol: UDP
        - containerPort: 32767
          protocol: UDP
        env:
        - name: ZFS_NFS_HOSTNAME
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
      volumes:
      - name: dev-zfs
        hostPath:
          path: /dev/zfs
      - name: dataset
        hostPath:
          path: /tank/kubernetes
      nodeSelector:
        kubernetes.io/hostname: zfsnode

Note that I had to make a small patch within kubernetes-zfs-provisioner so that the pod IP address (contained in the ZFS_NFS_HOSTNAME environment variable) gets used a the NFSVolumeSource's server address instead of the storage class's hostname parameter.

Is this something that would be worth having as a default configuration? It needs the ZFS host to be part of the cluster, but has the advantage not to require extra setup such as SSH keys.

jp39 avatar Sep 16 '24 09:09 jp39