vultr-csi
vultr-csi copied to clipboard
[BUG] - unix /var/lib/kubelet/plugins/block.csi.vultr.com/csi.sock not accessible (for rook.io)
Describe the bug
Using rook.io, the rook pods rook-ceph-osd-prepare-
fails to setup a PersistentVolumeClaim.
"describe pod" finally reports the event (warning) "MapVolume.SetUpDevice failed for volume "pvc-c869a0057b0c4904" : kubernetes.io/csi: blockMapper.stageVolumeForBlock failed to check STAGE_UNSTAGE_VOLUME capability: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /var/lib/kubelet/plugins/block.csi.vultr.com/csi.sock: connect: connection refused"" from the kubelet.
To Reproduce Steps to reproduce the behavior: (NOTE: This setup works fine on Azure AKS, with only the storageClassName adjusted.)
- Create a fresh kubernetes cluster (probably 1 worker node is sufficient for reproduction)
- For basic Rook setup: From the files in https://github.com/rook/rook/tree/master/deploy/examples, run kubectl apply -f for crds.yaml, common.yaml and operator.yaml, this creates CRDs, Roles and a rook-ceph-operator deployment/pod
- Run kubectl apply -f for the following CephCluster yaml:
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
name: rook-ceph
namespace: rook-ceph
spec:
cephVersion:
# NOTE: see cluster.yaml in <https://github.com/rook/rook.git> for up-to-date image version
image: quay.io/ceph/ceph:v17.2.1
allowUnsupported: false
dataDirHostPath: /var/lib/rook
skipUpgradeChecks: false
continueUpgradeAfterChecksEvenIfNotHealthy: false
waitTimeoutForHealthyOSDInMinutes: 10
mon:
count: 3
allowMultiplePerNode: false
mgr:
count: 2
allowMultiplePerNode: false
modules:
- name: pg_autoscaler
enabled: true
dashboard:
enabled: true
ssl: true
storage:
storageClassDeviceSets:
- name: set1
# NOTE: change this to the number of nodes that should host an OSD
count: 1
portable: false
tuneDeviceClass: false
encrypted: false
volumeClaimTemplates:
- metadata:
name: data
spec:
storageClassName: vultr-block-storage-hdd
accessModes:
- ReadWriteOnce
# NOTE: rook seems to expect a raw, unmounted device "volumeMode: Block"
volumeMode: Block
resources:
requests:
storage: 40Gi
- See events in
kubectl -n rook-ceph describe pod rook-ceph-osd-prepare-*
after the pod(s) got stuck.
Expected behavior
The pods rook-ceph-osd-prepare-*
should disappear after a short time, and instead corresponding rook-ceph-osd-*
pods (without -prepare-
) should remain.
Additional context I was using VKE with Kubernetes 1.23.x.
@defaultbranch
#NOTE: rook seems to expect a raw, unmounted device "volumeMode: Block" volumeMode: Block
IIRC the vultr-csi doesn't support a raw unmounted device
https://github.com/vultr/vultr-csi/blob/master/driver/mounter.go#L113
Hi, I have encountered the same situation. I have a question about this matter.
- do you plan to support
volumeMode: Block
in the future? - is it possible to support
volumeMode: Block
by changing this CSI implementation?
Thank you in advance.
I have a question about this matter.
- do you plan to support
volumeMode: Block
in the future?- is it possible to support
volumeMode: Block
by changing this CSI implementation?
Unofficially: I'd expect this should be possible (with changes). When attaching block devices to vultr instances normally, you get the raw device and can do stuff to them (create LVM volumes, basic filesystems, use LUKS, etc).