vultr-csi [BUG] - unix /var/lib/kubelet/plugins/block.csi.vultr.com/csi.sock not accessible (for rook.io)

Describe the bug Using rook.io, the rook pods rook-ceph-osd-prepare- fails to setup a PersistentVolumeClaim.

"describe pod" finally reports the event (warning) "MapVolume.SetUpDevice failed for volume "pvc-c869a0057b0c4904" : kubernetes.io/csi: blockMapper.stageVolumeForBlock failed to check STAGE_UNSTAGE_VOLUME capability: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /var/lib/kubelet/plugins/block.csi.vultr.com/csi.sock: connect: connection refused"" from the kubelet.

To Reproduce Steps to reproduce the behavior: (NOTE: This setup works fine on Azure AKS, with only the storageClassName adjusted.)

Create a fresh kubernetes cluster (probably 1 worker node is sufficient for reproduction)
For basic Rook setup: From the files in https://github.com/rook/rook/tree/master/deploy/examples, run kubectl apply -f for crds.yaml, common.yaml and operator.yaml, this creates CRDs, Roles and a rook-ceph-operator deployment/pod
Run kubectl apply -f for the following CephCluster yaml:

apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: rook-ceph
  namespace: rook-ceph
spec:
  cephVersion:
    # NOTE: see cluster.yaml in <https://github.com/rook/rook.git> for up-to-date image version 
    image: quay.io/ceph/ceph:v17.2.1
    allowUnsupported: false
  dataDirHostPath: /var/lib/rook
  skipUpgradeChecks: false
  continueUpgradeAfterChecksEvenIfNotHealthy: false
  waitTimeoutForHealthyOSDInMinutes: 10
  mon:
    count: 3
    allowMultiplePerNode: false
  mgr:
    count: 2
    allowMultiplePerNode: false
    modules:
      - name: pg_autoscaler
        enabled: true
  dashboard:
    enabled: true
    ssl: true
  storage:
   storageClassDeviceSets:
    - name: set1
      # NOTE: change this to the number of nodes that should host an OSD
      count: 1
      portable: false
      tuneDeviceClass: false
      encrypted: false
      volumeClaimTemplates:
      - metadata:
          name: data
        spec:
          storageClassName: vultr-block-storage-hdd
          accessModes:
            - ReadWriteOnce
          # NOTE: rook seems to expect a raw, unmounted device "volumeMode: Block"
          volumeMode: Block
          resources:
            requests:
              storage: 40Gi

See events in kubectl -n rook-ceph describe pod rook-ceph-osd-prepare-* after the pod(s) got stuck.

Expected behavior The pods rook-ceph-osd-prepare-* should disappear after a short time, and instead corresponding rook-ceph-osd-* pods (without -prepare-) should remain.

Additional context I was using VKE with Kubernetes 1.23.x.

Jul 18 '22 21:07 defaultbranch

@defaultbranch

#NOTE: rook seems to expect a raw, unmounted device "volumeMode: Block" volumeMode: Block

IIRC the vultr-csi doesn't support a raw unmounted device

https://github.com/vultr/vultr-csi/blob/master/driver/mounter.go#L113

Jul 18 '22 23:07 ddymko

Hi, I have encountered the same situation. I have a question about this matter.

do you plan to support volumeMode: Block in the future?
is it possible to support volumeMode: Block by changing this CSI implementation?

Thank you in advance.

Mar 08 '24 14:03 kaznak

I have a question about this matter.

do you plan to support volumeMode: Block in the future?

is it possible to support volumeMode: Block by changing this CSI implementation?

Unofficially: I'd expect this should be possible (with changes). When attaching block devices to vultr instances normally, you get the raw device and can do stuff to them (create LVM volumes, basic filesystems, use LUKS, etc).

Mar 11 '24 12:03 cuppett

vultr-csi vultr-csi copied to clipboard

[BUG] - unix /var/lib/kubelet/plugins/block.csi.vultr.com/csi.sock not accessible (for rook.io)

vultr-csi
vultr-csi copied to clipboard