data pool for metadata pool isn't found
Describe the bug
Creating a PVC using a StorageClass with different data & metadata pools fails.
rbd_util.go:1641] ID: 27 Req-ID: pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c setting image options on my-rbd-repl/my-vol-cd61239b-5756-4b4a-be8b-63dff4c31b58, data pool %!s(MISSING)my-rbd
The two pools are there:
ceph df | grep my-rbd
my-rbd 17 32 8 KiB 473 12 KiB 0 6.5 TiB
my-rbd-repl 19 32 19 B 5 8 KiB 0 5 TiB
The storage class:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: csi-rbd-sc
annotations:
storageclass.kubernetes.io/is-default-class: 'false'
provisioner: rbd.csi.ceph.com
parameters:
pool: my-rbd-repl
dataPool: my-rbd
clusterID: ....
volumeNamePrefix: my-vol-
imageFeatures: layering
imageFormat: "2"
csi.storage.k8s.io/fstype: ext4
csi.storage.k8s.io/provisioner-secret-namespace: default
csi.storage.k8s.io/provisioner-secret-name: csi-rbd-secret
csi.storage.k8s.io/node-stage-secret-namespace: default
csi.storage.k8s.io/node-stage-secret-name: csi-rbd-secret
csi.storage.k8s.io/controller-expand-secret-namespace: default
csi.storage.k8s.io/controller-expand-secret-name: csi-rbd-secret
volumeBindingMode: Immediate
reclaimPolicy: Delete
allowVolumeExpansion: true
mountOptions:
- discard
Environment details
- Image/version of Ceph CSI driver : quay.io/cephcsi/cephcsi:v3.13.0
- Helm chart version : ceph-csi-rbd-3.13.0
- Kernel version : 5.14
- Mounter used for mounting PVC (for cephFS its
fuseorkernel. for rbd itskrbdorrbd-nbd) : - Kubernetes cluster version :1.27.4
- Ceph cluster version : 19.2.0
Steps to reproduce
Steps to reproduce the behavior:
- Configure ceph-csi storage class with both metadata and data pools:
-
- my-rbd is an erasure coded pool
-
- my-rbd-repl is a replication pool.
- If I just use "my-rbd-repl" for the storageclass, no problem (just inefficient disk).
- If I just use "my-rbd" for the storageclass, I get a different error - the storageclass wants a replicated data pool for metadata.
- This error comes from when I try to use a different pool for each of metadata and data.
Actual results
I get an error implying that ceph-csi can't find the data pool.
Expected behavior
For ceph-csi to use my-rbd-repl for metadata and my-rbd for data
Logs
If the issue is in PVC creation, deletion, cloning please attach complete logs of below containers.
This is from the provisioner that's doing the work:
I0127 08:37:11.457540 1 utils.go:266] ID: 77 Req-ID: pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c GRPC call: /csi.v1.Controller/CreateVolume
I0127 08:37:11.457919 1 utils.go:267] ID: 77 Req-ID: pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c GRPC request: {"capacity_range":{"required_bytes":52428800},"name":"pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c","parameters":{"clusterID":"fd9c1e26-da6e-11ef-8593-3cecef103636","csi.storage.k8s.io/pv/name":"pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c","csi.storage.k8s.io/pvc/name":"raw-block-pvc","csi.storage.k8s.io/pvc/namespace":"ceph-csi-rbd","dataPool":"my-rbd","imageFeatures":"layering","imageFormat":"2","pool":"my-rbd-repl","volumeNamePrefix":"my-vol-"},"secrets":"***stripped***","volume_capabilities":[{"AccessType":{"Block":{}},"access_mode":{"mode":1}}]}
I0127 08:37:11.458319 1 rbd_util.go:1387] ID: 77 Req-ID: pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c setting disableInUseChecks: false image features: [layering] mounter: rbd
I0127 08:37:11.459737 1 omap.go:89] ID: 77 Req-ID: pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c got omap values: (pool="my-rbd-repl", namespace="", name="csi.volumes.default"): map[]
I0127 08:37:11.465571 1 omap.go:159] ID: 77 Req-ID: pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c set omap keys (pool="my-rbd-repl", namespace="", name="csi.volumes.default"): map[csi.volume.pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c:58568a22-4043-4326-84ee-62d860bdf19d])
I0127 08:37:11.467524 1 omap.go:159] ID: 77 Req-ID: pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c set omap keys (pool="my-rbd-repl", namespace="", name="csi.volume.58568a22-4043-4326-84ee-62d860bdf19d"): map[csi.imagename:my-vol-58568a22-4043-4326-84ee-62d860bdf19d csi.volname:pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c csi.volume.owner:ceph-csi-rbd])
I0127 08:37:11.467548 1 rbd_journal.go:515] ID: 77 Req-ID: pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c generated Volume ID (0001-0024-fd9c1e26-da6e-11ef-8593-3cecef103636-0000000000000013-58568a22-4043-4326-84ee-62d860bdf19d) and image name (my-vol-58568a22-4043-4326-84ee-62d860bdf19d) for request name (pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c)
I0127 08:37:11.467596 1 rbd_util.go:437] ID: 77 Req-ID: pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c rbd: create my-rbd-repl/my-vol-58568a22-4043-4326-84ee-62d860bdf19d size 50M (features: [layering]) using mon 10.0.1.1:6789,10.0.1.2:6789,10.0.1.3:6789
I0127 08:37:11.467650 1 rbd_util.go:1641] ID: 77 **Req-ID: pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c setting image options on my-rbd-repl/my-vol-58568a22-4043-4326-84ee-62d860bdf19d, data pool %!s(MISSING)my-rbd**
E0127 08:37:11.480323 1 controllerserver.go:749] ID: 77 Req-ID: pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c failed to create volume: failed to create rbd image: rbd: ret=-22, Invalid argument
I0127 08:37:11.484437 1 omap.go:126] ID: 77 Req-ID: pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c removed omap keys (pool="my-rbd-repl", namespace="", name="csi.volumes.default"): [csi.volume.pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c]
E0127 08:37:11.484478 1 utils.go:271] ID: 77 Req-ID: pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c GRPC error: rpc error: code = Internal desc = failed to create rbd image: rbd: ret=-22, Invalid argument
This is a debug message, and it's formatting looks broken:
I0127 08:37:11.467650 1 rbd_util.go:1641] ID: 77 Req-ID: pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c setting image options on my-rbd-repl/my-vol-58568a22-4043-4326-84ee-62d860bdf19d, data pool %!s(MISSING)my-rbd
It comes from this line:
https://github.com/ceph/ceph-csi/blob/935027f0d082736f367a6bc8e253769bcf497178/internal/rbd/rbd_util.go#L1606
There is a %s marker in the logMsg, which should not be there. It causes the %!s(MISSING) part in the output.
That also means that setting the data-pool did not fail, as the debug log message is only written at the end of the function, in case no failures occurred.
The real problem seems to be this:
E0127 08:37:11.480323 1 controllerserver.go:749] ID: 77 Req-ID: pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c failed to create volume: failed to create rbd image: rbd: ret=-22, Invalid argument
Which happens at the time of the image creation.
https://github.com/ceph/ceph-csi/blob/15ffa4808276f231d25cabe37173dfc48495a4fe/internal/rbd/rbd_util.go#L456-L459
It is not clear which image option could be invalid. The dataPool option is something that we test with an erasure coded pool in our e2e that runs for every PR. We can be quite confident that it works, generally. There must be something else in your environment that causes RBD-image creation to fail. Can you check the following:
- do the credentials for the provisioner have access to both pools?
- can you create an image manually with the same configuration?
- are there any logs on the Ceph side about the failure (in the OSDs, or maybe MONs)?