datashim icon indicating copy to clipboard operation
datashim copied to clipboard

Bug: datashim containers keep crashing on version 0.4.1 "Failed to connect to the CSI driver"

Open adippl opened this issue 1 year ago • 4 comments

What happened:

All datashim containers keep crashing on version 0.4.1 0.4.0 release works perfectly normal

kubectl -n dlf logs pod/csi-attacher-s3-0
I0311 15:16:58.441647       1 main.go:109] "Version" version="v4.7.0"
I0311 15:16:58.443204       1 connection.go:234] "Connecting" address="unix:///csi/csi.sock"
I0311 15:17:08.443973       1 connection.go:253] "Still connecting" address="unix:///csi/csi.sock"
I0311 15:17:18.444043       1 connection.go:253] "Still connecting" address="unix:///csi/csi.sock"
I0311 15:17:28.443460       1 connection.go:253] "Still connecting" address="unix:///csi/csi.sock"
E0311 15:17:28.443539       1 main.go:149] "Failed to connect to the CSI driver" err="context deadline exceeded" csiAddress="/csi/csi.sock"
kubectl -n dlf get all
NAME                                    READY   STATUS             RESTARTS      AGE
pod/csi-attacher-s3-0                   0/1     CrashLoopBackOff   4 (71s ago)   5m19s
pod/csi-provisioner-s3-0                0/1     CrashLoopBackOff   4 (79s ago)   5m19s
pod/csi-s3-29jdp                        0/2     CrashLoopBackOff   9 (83s ago)   5m19s
pod/csi-s3-snjj7                        0/2     CrashLoopBackOff   9 (82s ago)   5m18s
pod/csi-s3-tzrgv                        0/2     CrashLoopBackOff   9 (67s ago)   5m19s
pod/dataset-operator-78555d79d6-98k5s   1/1     Running            0             5m20s

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

It happens on all of my clusters

Anything else we need to know?:

Environment:

  • Datashim version: 0.4.1

  • Kubernetes version (use kubectl version): Client Version: v1.32.1 Kustomize Version: v5.5.0 Server Version: v1.32.1

  • Kubernetes distribution: normal kubernetes

  • Cloud provider or hardware configuration: qemu/kvm

  • OS (e.g: cat /etc/os-release): Gentoo

  • Kernel (e.g. uname -a): 6.6.74-gentoo-dist

  • Install tools: kubeadm

  • Others:

adippl avatar Mar 11 '25 15:03 adippl

@adippl apologies for the delay and thanks for the bug report. We'll try to reproduce it on our end

srikumar003 avatar Mar 21 '25 00:03 srikumar003

Hello, Just got the same error while deploying 0.4.1 with helm chart on last revision. Rev 0.4.0 works fine as well.

Client Version: v1.32.3 Kustomize Version: v5.5.0 Server Version: v1.31.7+rke2r1 OS release: Ubuntu 22.04.5 LTS" Hardware: KVM + Host CPU (6 cores + 12Gbram)

kubectl -n datashim logs pod/csi-attacher-s3-0
I0409 11:20:09.906957       1 main.go:109] "Version" version="v4.7.0"
I0409 11:20:09.907682       1 connection.go:234] "Connecting" address="unix:///csi/csi.sock"
I0409 11:20:19.908374       1 connection.go:253] "Still connecting" address="unix:///csi/csi.sock"
I0409 11:20:29.908428       1 connection.go:253] "Still connecting" address="unix:///csi/csi.sock"
I0409 11:20:39.908314       1 connection.go:253] "Still connecting" address="unix:///csi/csi.sock"
E0409 11:20:39.908400       1 main.go:149] "Failed to connect to the CSI driver" err="context deadline exceeded" csiAddress="/csi/csi.sock"
kubectl -n datashim logs pod/csi-nodeplugin-nfsplugin-6zkrg
Defaulted container "node-driver-registrar" out of: node-driver-registrar, nfs
I0409 11:26:11.140098       1 main.go:150] "Version" version="v1.12.0"
I0409 11:26:11.140169       1 main.go:151] "Running node-driver-registrar" mode=""
I0409 11:26:11.140174       1 main.go:172] "Attempting to open a gRPC connection" csiAddress="/plugin/csi.sock"
I0409 11:26:11.140184       1 connection.go:234] "Connecting" address="unix:///plugin/csi.sock"
I0409 11:26:21.140564       1 connection.go:253] "Still connecting" address="unix:///plugin/csi.sock"
I0409 11:26:31.140493       1 connection.go:253] "Still connecting" address="unix:///plugin/csi.sock"
I0409 11:26:41.141151       1 connection.go:253] "Still connecting" address="unix:///plugin/csi.sock"
E0409 11:26:41.141222       1 main.go:176] "Error connecting to CSI driver" err="context deadline exceeded"

celi28 avatar Apr 09 '25 11:04 celi28

@celi28 @adippl s3driver and nfs-plugin images in the official helm chart uses the 8f50a01(0.4.1) tag, which may not support linux/amd64. I changed it to the latest tag manually, and it can run successfully.

kubectl logs -ndlf csi-s3-4r7ws -c csi-s3
exec /s3driver: exec format error

Image

cc @srikumar003

chengzhycn avatar Jun 20 '25 03:06 chengzhycn

@celi28 @adippl s3driver and nfs-plugin images in the official helm chart uses the 8f50a01(0.4.1) tag, which may not support linux/amd64. I changed it to the latest tag manually, and it can run successfully.

kubectl logs -ndlf csi-s3-4r7ws -c csi-s3 exec /s3driver: exec format error Image

cc @srikumar003

I can confirm, such workaround works, e.g.:

helm install datashim datashim/datashim-charts \
--version v0.4.1 \
--namespace dlf \
-f -<<EOF
csi-nfs-chart:
  enabled: false
csi-s3-chart:
  enabled: true
  csis3:
    image: csi-s3
    tag: latest
EOF

maksimsamt avatar Aug 01 '25 16:08 maksimsamt