kubernetes-csi-tencentcloud icon indicating copy to clipboard operation
kubernetes-csi-tencentcloud copied to clipboard

csi drivers on OpenShift cluster

Open innerforce opened this issue 8 months ago • 2 comments

hello we have had RedHat team testing our Tencent Cloud driver and here is the result of their investigation:

Below is response what I got from RedHat team. I had tried to install Tencent cloud csi drivers on OpenShift cluster on Tencent Cloud. Unfortunately none of them worked.

The steps I did as below:

  1. download images from tcr and push them to disconnected quay registry:
curl -O https://rhocp-41711-1332667311.cos.ap-nanjing.myqcloud.com/tcs-images/tcs-csi-provisioner.tar
 curl -O https://rhocp-41711-1332667311.cos.ap-nanjing.myqcloud.com/tcs-images/tcs-csi-attacher.tar
 curl -O https://rhocp-41711-1332667311.cos.ap-nanjing.myqcloud.com/tcs-images/tcs-csi-snapshotter.tar
 curl -O https://rhocp-41711-1332667311.cos.ap-nanjing.myqcloud.com/tcs-images/tcs-snapshot-controller.tar
 curl -O https://rhocp-41711-1332667311.cos.ap-nanjing.myqcloud.com/tcs-images/tcs-csi-resizer.tar
 curl -O https://rhocp-41711-1332667311.cos.ap-nanjing.myqcloud.com/tcs-images/tcs-csi-tencentcloud-cbs.tar
 curl -O https://rhocp-41711-1332667311.cos.ap-nanjing.myqcloud.com/tcs-images/tcs-csi-node-driver-registrar.tar
 curl -O https://rhocp-41711-1332667311.cos.ap-nanjing.myqcloud.com/tcs-images/tcs-csi-tencentcloud-cfs.tar
 curl -O https://rhocp-41711-1332667311.cos.ap-nanjing.myqcloud.com/tcs-images/tcs-csi-tencentcloud-cos.tar
 curl -O https://rhocp-41711-1332667311.cos.ap-nanjing.myqcloud.com/tcs-images/tcs-csi-tencentcloud-cos-launcher.tar
 curl -O https://rhocp-41711-1332667311.cos.ap-nanjing.myqcloud.com/tcs-images/busybox.tar
podman load -i tcs-csi-provisioner.tar
 podman load -i tcs-csi-attacher.tar
 podman load -i tcs-csi-snapshotter.tar
 podman load -i tcs-snapshot-controller.tar
 podman load -i tcs-csi-resizer.tar
 podman load -i tcs-csi-tencentcloud-cbs.tar
 podman load -i tcs-csi-node-driver-registrar.tar
 podman load -i tcs-csi-tencentcloud-cfs.tar
 podman load -i tcs-csi-tencentcloud-cos.tar
 podman load -i tcs-csi-tencentcloud-cos-launcher.tar
 podman load -i busybox.tar
podman tag ccr.ccs.tencentyun.com/tkeimages/csi-provisioner:v2.0.4 quay.ocp4.example.com:8443/tcr/tkeimages/csi-provisioner:v2.0.4
 podman tag ccr.ccs.tencentyun.com/tkeimages/csi-attacher:v3.0.2 quay.ocp4.example.com:8443/tcr/tkeimages/csi-attacher:v3.0.2
 podman tag ccr.ccs.tencentyun.com/tkeimages/csi-snapshotter:v3.0.2 quay.ocp4.example.com:8443/tcr/tkeimages/csi-snapshotter:v3.0.2
 podman tag ccr.ccs.tencentyun.com/tkeimages/snapshot-controller:v3.0.2 quay.ocp4.example.com:8443/tcr/tkeimages/snapshot-controller:v3.0.2
 podman tag ccr.ccs.tencentyun.com/tkeimages/csi-resizer:v1.0.1 quay.ocp4.example.com:8443/tcr/tkeimages/csi-resizer:v1.0.1
 podman tag ccr.ccs.tencentyun.com/tkeimages/csi-tencentcloud-cbs:v2.3.3 quay.ocp4.example.com:8443/tcr/tkeimages/csi-tencentcloud-cbs:v2.3.3
 podman tag ccr.ccs.tencentyun.com/tkeimages/csi-node-driver-registrar:v2.0.1 quay.ocp4.example.com:8443/tcr/tkeimages/csi-node-driver-registrar:v2.0.1
 podman tag ccr.ccs.tencentyun.com/tkeimages/csi-tencentcloud-cfs:v2.0.6 quay.ocp4.example.com:8443/tcr/tkeimages/csi-tencentcloud-cfs:v2.0.6
 podman tag ccr.ccs.tencentyun.com/tkeimages/csi-tencentcloud-cos:v2.0.2 quay.ocp4.example.com:8443/tcr/tkeimages/csi-tencentcloud-cos:v2.0.2
 podman tag ccr.ccs.tencentyun.com/tkeimages/csi-tencentcloud-cos-launcher:v2.0.2 quay.ocp4.example.com:8443/tcr/tkeimages/csi-tencentcloud-cos-launcher:v2.0.2
 podman tag docker.io/library/busybox:stable-glibc quay.ocp4.example.com:8443/tcr/busybox:latest
podman push quay.ocp4.example.com:8443/tcr/tkeimages/csi-provisioner:v2.0.4
 podman push quay.ocp4.example.com:8443/tcr/tkeimages/csi-attacher:v3.0.2
 podman push quay.ocp4.example.com:8443/tcr/tkeimages/csi-snapshotter:v3.0.2
 podman push quay.ocp4.example.com:8443/tcr/tkeimages/snapshot-controller:v3.0.2
 podman push quay.ocp4.example.com:8443/tcr/tkeimages/csi-resizer:v1.0.1
 podman push quay.ocp4.example.com:8443/tcr/tkeimages/csi-tencentcloud-cbs:v2.3.3
 podman push quay.ocp4.example.com:8443/tcr/tkeimages/csi-node-driver-registrar:v2.0.1
 podman push quay.ocp4.example.com:8443/tcr/tkeimages/csi-tencentcloud-cfs:v2.0.6
 podman push quay.ocp4.example.com:8443/tcr/tkeimages/csi-tencentcloud-cos:v2.0.2
 podman push quay.ocp4.example.com:8443/tcr/tkeimages/csi-tencentcloud-cos-launcher:v2.0.2
 podman push quay.ocp4.example.com:8443/tcr/busybox:latest
  1. apply image tag mirror set and switch default project to kube-system:
oc apply -f tcr-itms.yaml
oc project kube-system

you have to replace "quay.ocp4.example.com:8443" to your quay registry url in tcr-itms.yaml file.

  1. deploy cbs csi driver which I referenced by https://github.com/TencentCloud/kubernetes-csi-tencentcloud/blob/master/docs/README_CBS.md :
oc apply -f cbs-secret.yaml
oc apply -f cbs-csi-node-rbac.yaml
oc apply -f cbs-csi-node.yaml
oc apply -f cbs-csi-controller-rbac.yaml
oc apply -f cbs-csi-controller.yaml
oc apply -f cbs-storageclass.yaml
oc apply -f cbs-test-pvc.yaml
oc apply -f cbs-test-pod.yaml

you have to replace TENCENTCLOUD_CBS_API_SECRET_ID/TENCENTCLOUD_CBS_API_SECRET_KEY's value as your tencent cloud secret id/key's base64 string in cbs-secret.yaml file. The test-cbs-pvc and related pv were pending to create, it looked like csi-cbs-node pods were not stable, socket can't connected:

oc get storageclass
 NAME                   PROVISIONER                RECLAIMPOLICY  VOLUMEBINDINGMODE  ALLOWVOLUMEEXPANSION  AGE
 cbs-csi                com.tencent.cloud.csi.cbs  Delete         Immediate          false                 5m16s
oc get pvc
 NAME          STATUS   VOLUME  CAPACITY  ACCESS MODES  STORAGECLASS  VOLUMEATTRIBUTESCLASS  AGE
 test-cbs-pvc  Pending                                     cbs-csi       <unset>                16m
oc get pv
 No resources found
oc get pod
 NAME                                READY  STATUS            RESTARTS       AGE
 cbs-test-app                        0/1    Pending           0              2s
 csi-cbs-controller-fc5f44946-8kgs8  6/6    Running           0              38m
 csi-cbs-node-64cnw                  2/2    Running           5 (3m10s ago)  22m
 csi-cbs-node-94wv6                  2/2    Running           2 (4m26s ago)  22m
 csi-cbs-node-gtmpt                  2/2    Running           5 (2m59s ago)  22m
 csi-cbs-node-lm9s7                  1/2    CrashLoopBackOff  4 (72s ago)    22m
 csi-cbs-node-ntgtd                  2/2    Running           3 (4m43s ago)  22m
 csi-cbs-node-znrmj                  2/2    Running           3 (104s ago)   21m
oc logs csi-cbs-node-lm9s7 -c cbs-csi
 I0406 06:40:40.610723  27585 main.go:44] Building kube configs for running in cluster...
 I0406 06:40:40.612635  27585 driver.go:48] Driver: com.tencent.cloud.csi.cbs version: v2.3.3
 oc logs csi-cbs-node-lm9s7 -c driver-registrar
 I0406 06:17:43.520604  18048 main.go:112] Version: v2.0.1
 I0406 06:17:43.520642  18048 main.go:122] Attempting to open a gRPC connection with: "/csi/csi.sock"
 I0406 06:17:43.520652  18048 connection.go:151] Connecting to unix:///csi/csi.sock
 W0406 06:17:53.520744  18048 connection.go:170] Still connecting to unix:///csi/csi.sock
  1. deploy cfs csi driver which I referenced by https://github.com/TencentCloud/kubernetes-csi-tencentcloud/blob/master/docs/README_CFS.md :
oc apply -f cfs-secret.yaml
oc apply -f cfs-csi-rbac.yaml
oc apply -f cfs-csi-driver.yaml
oc apply -f cfs-csi-nodeplugin.yaml
oc apply -f cfs-csi-provisioner.yaml
oc apply -f cfs-storageclass.yaml
oc apply -f cfs-test-pvc.yaml
oc apply -f cfs-test-pod.yaml

you have to replace TENCENTCLOUD_CFS_API_SECRET_ID/TENCENTCLOUD_CFS_API_SECRET_KEY's value as your tencent cloud secret id/key's base64 string in cfs-secret.yaml file. The test-cfs-pvc and related pv were pending to create, even if I created CFS service manually. It looked like :

oc get storageclass
 NAME                   PROVISIONER                RECLAIMPOLICY  VOLUMEBINDINGMODE  ALLOWVOLUMEEXPANSION  AGE
 cfs-csi                com.tencent.cloud.csi.cfs  Delete         Immediate          false                 3m47s
oc get pvc
 NAME          STATUS   VOLUME  CAPACITY  ACCESS MODES  STORAGECLASS  VOLUMEATTRIBUTESCLASS  AGE
 test-cfs-pvc  Pending                                     cfs-csi       <unset>                43s
oc get pv
 No resources found
oc get pod
 NAME                            READY  STATUS   RESTARTS  AGE
 cfs-csi-app                         0/1    Pending  0         3s
 csi-nodeplugin-cfsplugin-2pwnw  2/2    Running  0         7m42s
 csi-nodeplugin-cfsplugin-52drh  2/2    Running  0         7m42s
 csi-nodeplugin-cfsplugin-8fxv8  2/2    Running  0         7m42s
 csi-nodeplugin-cfsplugin-c4qlg  2/2    Running  0         7m42s
 csi-nodeplugin-cfsplugin-g8j5c  2/2    Running  0         7m42s
 csi-nodeplugin-cfsplugin-npdp5  2/2    Running  0         7m42s
 csi-provisioner-cfsplugin-0     2/2    Running  0         5m17s
oc get event
 1m        Warning  ProvisioningFailed    persistentvolumeclaim/test-cfs-pvc                                failed to provision volume with StorageClass "cfs-csi": rpc error: code = Internal desc = [TencentCloudSDKError] Code=ClientError.NetworkError, Message=Fail to get response because Post "https://cfs.internal.tencentcloudapi.com/": net/http: invalid header field value "TC3-HMAC-SHA256 Credential=AKIDZzEF3e5JK8j7IH3cz84nZBbqshJkxymM\n/2025-04-06/cfs/tc3_request, SignedHeaders=content-type;host, Signature=b5bd5cae03a9f9bfe308b308c541c747bdee039860d68901ba6ea4165878a5f6" for key Authorization, RequestId=
  1. deploy cosfs csi driver which I referenced by https://github.com/TencentCloud/kubernetes-csi-tencentcloud/blob/master/docs/README_COSFS.md :
oc apply -f cosfs-csi-driver.yaml
oc apply -f cosfs-csi-launcher.yaml
oc apply -f cosfs-csi-node-rbac.yaml
oc apply -f cosfs-csi-node.yaml

It looked like csi-cosplugin pods failed to start. I found nothing about configmap cos-lite in their documents:

oc get pod
 NAME                   READY  STATUS            RESTARTS     AGE
 csi-coslauncher-5w4tq  1/1    Running           0            11m
 csi-coslauncher-c4pvg  1/1    Running           0            11m
 csi-coslauncher-fqq5h  1/1    Running           0            11m
 csi-cosplugin-4fbsv    1/2    CrashLoopBackOff  7 (22s ago)  11m
 csi-cosplugin-q8l52    1/2    CrashLoopBackOff  7 (33s ago)  11m
 csi-cosplugin-rr2qq    1/2    CrashLoopBackOff  7 (26s ago)  11m
oc logs csi-cosplugin-4fbsv -c cosfs
 I0406 08:11:19.141429      1 main.go:40] Building clientset for running in cluster...
 I0406 08:11:19.142693      1 lite_config.go:326] start Init common liteConfigMap ...
 F0406 08:11:19.148332      1 main.go:52] failed to initLiteConfigMap, err: fail get cm: kube-system/cos-lite, err: configmaps "cos-lite" is forbidden: User "system:serviceaccount:kube-system:csi-cos-tencentcloud" cannot get resource "configmaps" in API group "" in the namespace "kube-system"

innerforce avatar Apr 08 '25 07:04 innerforce

cbs issue

for "csi-cbs-node-lm9s7 1/2 CrashLoopBackOff", please a. check --root-dir parameter of kubelet b. try to delete and recreate that pod

cfs issue

  1. for k8s >=1.18, please use yaml with new suffix: kubectl apply -f deploy/cfs/kubernetes/csi-nodeplugin-cfsplugin-new.yaml kubectl apply -f deploy/cfs/kubernetes/csi-provisioner-cfsplugin-new.yaml
Image

please follow https://github.com/TencentCloud/kubernetes-csi-tencentcloud/blob/master/docs/README_CFS.md

  1. when base64 encoding TENCENTCLOUD_CFS_API_SECRET_ID/TENCENTCLOUD_CFS_API_SECRET_KEY, please add -n:

echo -n <ID-or-KEY> | base64

cos issue

please kubectl edit clusterrole csi-cos-tencentcloud and add following:

  - apiGroups: [""]
    resources: ["configmaps"]
    verbs: ["get", "create", "delete", "update"]

borgerli avatar Apr 08 '25 09:04 borgerli

And for cfs issue, when base64 encoding TENCENTCLOUD_CFS_API_SECRET_ID/TENCENTCLOUD_CFS_API_SECRET_KEY, please add -n:

echo -n <ID-or-KEY> | base64

borgerli avatar Apr 08 '25 09:04 borgerli