datashim icon indicating copy to clipboard operation
datashim copied to clipboard

Transport endpoint is not connected error happening frequently and intermittently .

Open rrehman-hbk opened this issue 1 year ago • 4 comments

We are using datashim to connect to s3 bucket with access key and secret. We are mounting that volume 5-6 services. Even without the services restarting, we could see the service throwing "Transport endpoint is not connected error ". When we restart the service, the service is able to connect and read data.

We have installed datashim in dlf namespace and dataset in the namespace where are other services are present. Pasting csi-s3 pod logs

Defaulted container "driver-registrar" out of: driver-registrar, csi-s3
I1214 06:00:34.644398       1 main.go:167] Version: v2.8.0
I1214 06:00:34.644462       1 main.go:168] Running node-driver-registrar in mode=registration
I1214 06:00:34.644928       1 main.go:192] Attempting to open a gRPC connection with: "/csi/csi.sock"
I1214 06:00:34.644954       1 connection.go:164] Connecting to unix:///csi/csi.sock
I1214 06:00:35.646808       1 main.go:199] Calling CSI driver to discover driver name
I1214 06:00:35.646842       1 connection.go:193] GRPC call: /csi.v1.Identity/GetPluginInfo
I1214 06:00:35.646849       1 connection.go:194] GRPC request: {}
I1214 06:00:35.655130       1 connection.go:200] GRPC response: {"name":"ch.ctrox.csi.s3-driver","vendor_version":"v1.1.1"}
I1214 06:00:35.655149       1 connection.go:201] GRPC error: <nil>
I1214 06:00:35.655165       1 main.go:209] CSI driver name: "ch.ctrox.csi.s3-driver"
I1214 06:00:35.655288       1 node_register.go:53] Starting Registration Server at: /registration/ch.ctrox.csi.s3-driver-reg.sock
I1214 06:00:35.655532       1 node_register.go:62] Registration Server started at: /registration/ch.ctrox.csi.s3-driver-reg.sock
I1214 06:00:35.655661       1 node_register.go:92] Skipping HTTP server because endpoint is set to: ""
I1214 06:00:36.469837       1 main.go:102] Received GetInfo call: &InfoRequest{}
I1214 06:00:36.470165       1 main.go:109] "Kubelet registration probe created" path="/var/lib/kubelet/plugins/csi-s3/registration"
I1214 06:00:36.485115       1 main.go:121] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}`

rrehman-hbk avatar Dec 14 '23 10:12 rrehman-hbk

@rrehman-hbk there seems to be no error in the logs above. There is one csi-s3 pod per K8s node. Please check all the other ones and please paste the lines with any errors or warnings

srikumar003 avatar Dec 14 '23 11:12 srikumar003

This entire instance is on K3s on a VM. We have only one pod. I am attaching csi-attacher logs too. csi-atatcher

Output of kubectl get pods in dlf namespace where datashim is installed image

The service which gets disconnected from s3 is throwing the following error: s3-01: Transport endpoint is not connected

rrehman-hbk avatar Dec 14 '23 11:12 rrehman-hbk

More info: When we are uploading small file 2.4MB etc, things work. When we tried to upload files with size 87MB, it uploads certain percentage and fails -> when I check s3, I see it fails after 10MB. Is this due to some setting somewhere. I can directly upload the file in s3 bucket in AWS.

@srikumar003 Is there any config which limits things to 10 Mb

rrehman-hbk avatar Dec 18 '23 10:12 rrehman-hbk

Same issue @rrehman-hbk, Is there any update?

hao-tang-ts avatar Aug 01 '24 14:08 hao-tang-ts