aws-ebs-csi-driver Volume still hang on Karpenter Node Consolidation/Termination

/kind bug

What happened? As discussed at https://github.com/kubernetes-sigs/aws-ebs-csi-driver/issues/1665, @torredil said it's fixed in v1.27 (https://github.com/kubernetes-sigs/aws-ebs-csi-driver/issues/1665#issuecomment-1898607940) but we still got problem with v1.28

The pod using volume pv-A running in node N1
Karpenter terminate pod and terminate node N1
K8s start new pod and trying attach volume pv-A but still need to wait 6 minutes to be release and attach to new Pod

What you expected to happen?

After old pod has been terminated, the pv-A should be released and able to attach to new pod

How to reproduce it (as minimally and precisely as possible)?

Im using https://www.elastic.co/guide/en/cloud-on-k8s/current/index.html to setup cluster with 3 nodes

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: dev
spec:
  version: 8.12.2
  volumeClaimDeletePolicy: DeleteOnScaledownAndClusterDeletion
  updateStrategy:
    changeBudget:
      maxSurge: 2
      maxUnavailable: 1
  nodeSets:
  - name: default
    count: 3
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data # Do not change this name unless you set up a volume mount for the data path.
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 25Gi
    podTemplate:
      spec:
        nodeSelector:
          kubernetes.io/arch: arm64
          topology.kubernetes.io/zone: eu-central-1a
        containers:
        - name: elasticsearch
          env:
            - name: ES_JAVA_OPTS
              value: -Xms4g -Xmx4g
          resources:
            requests:
              memory: 5Gi
              cpu: 1
            limits:
              memory: 5Gi
              cpu: 2
    config:
      node.store.allow_mmap: false

Trigger spot instance termination or just delete 1 ec2 instance
The node has been removed in k8s very quick, old pod has been Terminated and k8s start new pod
Pod stuck in 6 minutes with error Multi-Attach error for volume "pvc-xxxxx-xxxxx-xxx" Volume is already exclusively attached to one node and can't be attached to another
After 6 minutes new pod can attach volume
Here is logs of ebs-csi-controller

I0302 06:12:10.305080       1 controller.go:430] "ControllerPublishVolume: attached" volumeID="vol-02b33186429105461" nodeID="i-0715ec90e486bb8a1" devicePath="/dev/xvdaa"
<< at 06:14 the node has been terminated but no logs here >>
I0302 06:20:18.486042       1 controller.go:471] "ControllerUnpublishVolume: detaching" volumeID="vol-02b33186429105461" nodeID="i-0715ec90e486bb8a1"
I0302 06:20:18.584737       1 cloud.go:792] "DetachDisk: called on non-attached volume" volumeID="vol-02b33186429105461"
I0302 06:20:18.807752       1 controller.go:474] "ControllerUnpublishVolume: attachment not found" volumeID="vol-02b33186429105461" nodeID="i-0715ec90e486bb8a1"
I0302 06:20:19.124534       1 controller.go:421] "ControllerPublishVolume: attaching" volumeID="vol-02b33186429105461" nodeID="i-0ee2a470112401ffb"
I0302 06:20:20.635493       1 controller.go:430] "ControllerPublishVolume: attached" volumeID="vol-02b33186429105461" nodeID="i-0ee2a470112401ffb" devicePath="/dev/xvdaa"

Anything else we need to know?: I setup csi driver using eks add-on Environment

Kubernetes version (use kubectl version):

Client Version: v1.29.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.0-eks-c417bb3

Driver version: v1.28.0-eksbuild.1

Mar 04 '24 02:03 elavol

I updated to karpenter 0.35 and using AL2023 image, the problem still happen

Mar 04 '24 04:03 elavol

some update: it seemly caused by elasticsearch statefulset, time to terminate es pod is so slow, it's possible that the driver pod be killed before es pod killed then the volume not released

Mar 04 '24 09:03 elavol

update: when I set PRE_STOP_ADDITIONAL_WAIT_SECONDS to 5 seconds, new pod can attach pvc normally, so I think the long delay of pod stop is reason for volume stuck

Mar 04 '24 10:03 elavol

update: when I set PRE_STOP_ADDITIONAL_WAIT_SECONDS to 5 seconds, new pod can attach pvc normally, so I think the long delay of pod stop is reason for volume stuck

Nope, in another try pv still stuck 🤦

Mar 04 '24 12:03 elavol

Hey there @levanlongktmt thank you for reporting this.

It appears your spot instances are being ungracefully terminated. Take a look at the documentation for guidance on navigating this issue: 6-Minute Delays in Attaching Volumes - What steps can be taken to mitigate this issue?

If you are still running into delays but have already configured the Kubelet and enabled interruption handling in Karpenter, please let us know. Thanks!

Mar 04 '24 14:03 torredil

@torredil ops... my user data missed last line

systemctl restart kubelet

let me try with it

Mar 04 '24 15:03 elavol

hey @torredil I tried but still no luck :( I already added SQS to handle interrupt, so when I trigger interupt in aws console, karpenter handle and launch new instance very quickly. I also follow logs of ebs-csi-node but it not show anything related to prestop hook, here is all logs of it

I0304 15:40:38.249081       1 driver.go:83] "Driver Information" Driver="ebs.csi.aws.com" Version="v1.28.0"
I0304 15:40:38.250134       1 node.go:93] "regionFromSession Node service" region=""
I0304 15:40:38.252657       1 metadata.go:85] "retrieving instance data from ec2 metadata"
I0304 15:40:38.257835       1 metadata.go:92] "ec2 metadata is available"
I0304 15:40:38.258488       1 metadata_ec2.go:25] "Retrieving EC2 instance identity metadata" regionFromSession=""
I0304 15:47:29.651847       1 mount_linux.go:243] Detected OS without systemd

after graceful shutdown of elasticsearch pod, the pod ebs-csi-node has been killed immediately. So the prestop has been ignored. I was thought it might because bug of AL2023, so I switch back to use AL2 but still no luck.

Mar 04 '24 16:03 elavol

I will try set InhibitDelayMaxSec to 60 (45 + 15)

Mar 04 '24 16:03 elavol

Still no luck, volume still be stuck in 6 minutes and prestop hook not working 😭

Mar 04 '24 16:03 elavol

@levanlongktmt Thanks for the followup info. Something that came to mind here is that it might be possible for newer versions of Karpenter to be incompatible with the pre-stop lifecycle hook due to this change: https://github.com/kubernetes-sigs/karpenter/pull/508.

You'll notice that the LCH won't run if the following taint is not present on the node during a termination event:

node.kubernetes.io/unschedulable

We'll take a closer look and report back with more details or a fix. Thank you!

Mar 04 '24 16:03 torredil

@torredil here is logs of karpenter and elasticsearch pod, so as I see

At 16:22:20 karpenter get interupt message
Very quickly karpenter call taint node and pod elasticsearch get shutdown signal
After 46 seconds the pod elasticsearch terminated
At 16:23:09 karpenter start delete node and the node has been deleted very quick

Logs of Karpenter

{"level":"INFO","time":"2024-03-04T16:22:20.270Z","logger":"controller.interruption","message":"initiating delete from interruption message","commit":"2c8f2a5","queue":"Karpenter-kollekt-eks","messageKind":"SpotInterruptionKind","nodeclaim":"default-2bd27","action":"CordonAndDrain","node":"i-0502b920c79f40b5a.eu-central-1.compute.internal"}
{"level":"INFO","time":"2024-03-04T16:22:20.336Z","logger":"controller.node.termination","message":"tainted node","commit":"2c8f2a5","node":"i-0502b920c79f40b5a.eu-central-1.compute.internal"}
{"level":"INFO","time":"2024-03-04T16:22:21.364Z","logger":"controller.provisioner","message":"found provisionable pod(s)","commit":"2c8f2a5","pods":"datalayer/dev-es-default-0","duration":"24.462475ms"}
{"level":"INFO","time":"2024-03-04T16:22:21.364Z","logger":"controller.provisioner","message":"computed new nodeclaim(s) to fit pod(s)","commit":"2c8f2a5","nodeclaims":1,"pods":1}
{"level":"INFO","time":"2024-03-04T16:22:21.379Z","logger":"controller.provisioner","message":"created nodeclaim","commit":"2c8f2a5","nodepool":"default","nodeclaim":"default-pk44s","requests":{"cpu":"680m","memory":"5240Mi","pods":"5"},"instance-types":"a1.2xlarge, a1.4xlarge, a1.metal, a1.xlarge, c6g.12xlarge and 91 other(s)"}
{"level":"INFO","time":"2024-03-04T16:22:23.782Z","logger":"controller.nodeclaim.lifecycle","message":"launched nodeclaim","commit":"2c8f2a5","nodeclaim":"default-pk44s","provider-id":"aws:///eu-central-1a/i-09a52eea14ff0b6c9","instance-type":"r6gd.medium","zone":"eu-central-1a","capacity-type":"spot","allocatable":{"cpu":"940m","ephemeral-storage":"17Gi","memory":"7075Mi","pods":"8","vpc.amazonaws.com/pod-eni":"4"}}
{"level":"INFO","time":"2024-03-04T16:22:31.356Z","logger":"controller.provisioner","message":"found provisionable pod(s)","commit":"2c8f2a5","pods":"datalayer/dev-es-default-0","duration":"16.586479ms"}
{"level":"INFO","time":"2024-03-04T16:22:41.353Z","logger":"controller.provisioner","message":"found provisionable pod(s)","commit":"2c8f2a5","pods":"datalayer/dev-es-default-0","duration":"13.509969ms"}
{"level":"INFO","time":"2024-03-04T16:22:51.355Z","logger":"controller.provisioner","message":"found provisionable pod(s)","commit":"2c8f2a5","pods":"datalayer/dev-es-default-0","duration":"14.534034ms"}
{"level":"INFO","time":"2024-03-04T16:22:58.280Z","logger":"controller.nodeclaim.lifecycle","message":"registered nodeclaim","commit":"2c8f2a5","nodeclaim":"default-pk44s","provider-id":"aws:///eu-central-1a/i-09a52eea14ff0b6c9","node":"i-09a52eea14ff0b6c9.eu-central-1.compute.internal"}
{"level":"INFO","time":"2024-03-04T16:23:01.356Z","logger":"controller.provisioner","message":"found provisionable pod(s)","commit":"2c8f2a5","pods":"datalayer/dev-es-default-0","duration":"13.781092ms"}
{"level":"INFO","time":"2024-03-04T16:23:09.571Z","logger":"controller.node.termination","message":"deleted node","commit":"2c8f2a5","node":"i-0502b920c79f40b5a.eu-central-1.compute.internal"}
{"level":"INFO","time":"2024-03-04T16:23:09.944Z","logger":"controller.nodeclaim.termination","message":"deleted nodeclaim","commit":"2c8f2a5","nodeclaim":"default-2bd27","node":"i-0502b920c79f40b5a.eu-central-1.compute.internal","provider-id":"aws:///eu-central-1a/i-0502b920c79f40b5a"}
{"level":"INFO","time":"2024-03-04T16:23:10.455Z","logger":"controller.disruption","message":"pod \"datalayer/dev-es-default-0\" has a preferred Anti-Affinity which can prevent consolidation","commit":"2c8f2a5"}
{"level":"INFO","time":"2024-03-04T16:23:10.698Z","logger":"controller.provisioner","message":"found provisionable pod(s)","commit":"2c8f2a5","pods":"datalayer/dev-es-default-0","duration":"83.215948ms"}
{"level":"INFO","time":"2024-03-04T16:23:10.731Z","logger":"controller.nodeclaim.lifecycle","message":"initialized nodeclaim","commit":"2c8f2a5","nodeclaim":"default-pk44s","provider-id":"aws:///eu-central-1a/i-09a52eea14ff0b6c9","node":"i-09a52eea14ff0b6c9.eu-central-1.compute.internal","allocatable":{"cpu":"940m","ephemeral-storage":"18233774458","hugepages-1Gi":"0","hugepages-2Mi":"0","hugepages-32Mi":"0","hugepages-64Ki":"0","memory":"7552796Ki","pods":"8"}}
{"level":"INFO","time":"2024-03-04T16:23:12.487Z","logger":"controller.provisioner","message":"found provisionable pod(s)","commit":"2c8f2a5","pods":"datalayer/dev-es-default-0","duration":"12.009019ms"}

Logs of elasticsearch pod

{"@timestamp": "2024-03-04T16:22:20+00:00", "message": "retrieving node ID", "ecs.version": "1.2.0", "event.dataset": "elasticsearch.pre-stop-hook"}
{"@timestamp": "2024-03-04T16:22:21+00:00", "message": "initiating node shutdown", "ecs.version": "1.2.0", "event.dataset": "elasticsearch.pre-stop-hook"}
{"@timestamp":"2024-03-04T16:22:21.338Z", "log.level": "INFO", "message":"creating shutdown record {nodeId=[qElhr0BPQnufjz5_CdDGMg], type=[RESTART], reason=[pre-stop hook]}", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[dev-es-default-0][masterService#updateTask][T#2]","log.logger":"org.elasticsearch.xpack.shutdown.TransportPutShutdownNodeAction","elasticsearch.cluster.uuid":"Z_NFF7KcT3OJ6ekUUoJ3Qg","elasticsearch.node.id":"qElhr0BPQnufjz5_CdDGMg","elasticsearch.node.name":"dev-es-default-0","elasticsearch.cluster.name":"dev"}
{"@timestamp":"2024-03-04T16:22:21.530Z", "log.level": "INFO", "message":"Aborting health node task due to node [{dev-es-default-0}{qElhr0BPQnufjz5_CdDGMg}] shutting down.", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[dev-es-default-0][clusterApplierService#updateTask][T#1]","log.logger":"org.elasticsearch.health.node.selection.HealthNodeTaskExecutor","elasticsearch.cluster.uuid":"Z_NFF7KcT3OJ6ekUUoJ3Qg","elasticsearch.node.id":"qElhr0BPQnufjz5_CdDGMg","elasticsearch.node.name":"dev-es-default-0","elasticsearch.cluster.name":"dev"}
{"@timestamp":"2024-03-04T16:22:21.535Z", "log.level": "INFO", "message":"Starting node shutdown sequence for ML", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[dev-es-default-0][clusterApplierService#updateTask][T#1]","log.logger":"org.elasticsearch.xpack.ml.MlLifeCycleService","elasticsearch.cluster.uuid":"Z_NFF7KcT3OJ6ekUUoJ3Qg","elasticsearch.node.id":"qElhr0BPQnufjz5_CdDGMg","elasticsearch.node.name":"dev-es-default-0","elasticsearch.cluster.name":"dev"}
{"@timestamp": "2024-03-04T16:22:21+00:00", "message": "waiting for node shutdown to complete", "ecs.version": "1.2.0", "event.dataset": "elasticsearch.pre-stop-hook"}
{"@timestamp": "2024-03-04T16:22:21+00:00", "message": "delaying termination for 44 seconds", "ecs.version": "1.2.0", "event.dataset": "elasticsearch.pre-stop-hook"}
{"@timestamp":"2024-03-04T16:23:05.782Z", "log.level": "INFO", "message":"stopping ...", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch-shutdown","log.logger":"org.elasticsearch.node.Node","elasticsearch.cluster.uuid":"Z_NFF7KcT3OJ6ekUUoJ3Qg","elasticsearch.node.id":"qElhr0BPQnufjz5_CdDGMg","elasticsearch.node.name":"dev-es-default-0","elasticsearch.cluster.name":"dev"}
{"@timestamp":"2024-03-04T16:23:05.783Z", "log.level": "INFO", "message":"shutting down watcher thread", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[file-watcher[/usr/share/elasticsearch/config/operator/settings.json]]","log.logger":"org.elasticsearch.common.file.AbstractFileWatchingService","elasticsearch.cluster.uuid":"Z_NFF7KcT3OJ6ekUUoJ3Qg","elasticsearch.node.id":"qElhr0BPQnufjz5_CdDGMg","elasticsearch.node.name":"dev-es-default-0","elasticsearch.cluster.name":"dev"}
{"@timestamp":"2024-03-04T16:23:05.785Z", "log.level": "INFO", "message":"watcher service stopped", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch-shutdown","log.logger":"org.elasticsearch.common.file.AbstractFileWatchingService","elasticsearch.cluster.uuid":"Z_NFF7KcT3OJ6ekUUoJ3Qg","elasticsearch.node.id":"qElhr0BPQnufjz5_CdDGMg","elasticsearch.node.name":"dev-es-default-0","elasticsearch.cluster.name":"dev"}
{"@timestamp":"2024-03-04T16:23:05.790Z", "log.level": "INFO", "message":"[controller/83] [Main.cc@176] ML controller exiting", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"ml-cpp-log-tail-thread","log.logger":"org.elasticsearch.xpack.ml.process.logging.CppLogMessageHandler","elasticsearch.cluster.uuid":"Z_NFF7KcT3OJ6ekUUoJ3Qg","elasticsearch.node.id":"qElhr0BPQnufjz5_CdDGMg","elasticsearch.node.name":"dev-es-default-0","elasticsearch.cluster.name":"dev"}
{"@timestamp":"2024-03-04T16:23:05.792Z", "log.level": "INFO", "message":"Native controller process has stopped - no new native processes can be started", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"ml-cpp-log-tail-thread","log.logger":"org.elasticsearch.xpack.ml.process.NativeController","elasticsearch.cluster.uuid":"Z_NFF7KcT3OJ6ekUUoJ3Qg","elasticsearch.node.id":"qElhr0BPQnufjz5_CdDGMg","elasticsearch.node.name":"dev-es-default-0","elasticsearch.cluster.name":"dev"}
{"@timestamp":"2024-03-04T16:23:05.793Z", "log.level": "INFO", "message":"stopping watch service, reason [shutdown initiated]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch-shutdown","log.logger":"org.elasticsearch.xpack.watcher.WatcherService","elasticsearch.cluster.uuid":"Z_NFF7KcT3OJ6ekUUoJ3Qg","elasticsearch.node.id":"qElhr0BPQnufjz5_CdDGMg","elasticsearch.node.name":"dev-es-default-0","elasticsearch.cluster.name":"dev"}
{"@timestamp":"2024-03-04T16:23:05.794Z", "log.level": "INFO", "message":"watcher has stopped and shutdown", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[dev-es-default-0][generic][T#3]","log.logger":"org.elasticsearch.xpack.watcher.WatcherLifeCycleService","elasticsearch.cluster.uuid":"Z_NFF7KcT3OJ6ekUUoJ3Qg","elasticsearch.node.id":"qElhr0BPQnufjz5_CdDGMg","elasticsearch.node.name":"dev-es-default-0","elasticsearch.cluster.name":"dev"}
{"timestamp": "2024-03-04T16:23:06+00:00", "message": "readiness probe failed", "curl_rc": "7"}
{"@timestamp":"2024-03-04T16:23:06.552Z", "log.level": "INFO", "message":"stopped", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch-shutdown","log.logger":"org.elasticsearch.node.Node","elasticsearch.cluster.uuid":"Z_NFF7KcT3OJ6ekUUoJ3Qg","elasticsearch.node.id":"qElhr0BPQnufjz5_CdDGMg","elasticsearch.node.name":"dev-es-default-0","elasticsearch.cluster.name":"dev"}
{"@timestamp":"2024-03-04T16:23:06.554Z", "log.level": "INFO", "message":"closing ...", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch-shutdown","log.logger":"org.elasticsearch.node.Node","elasticsearch.cluster.uuid":"Z_NFF7KcT3OJ6ekUUoJ3Qg","elasticsearch.node.id":"qElhr0BPQnufjz5_CdDGMg","elasticsearch.node.name":"dev-es-default-0","elasticsearch.cluster.name":"dev"}
{"@timestamp":"2024-03-04T16:23:06.576Z", "log.level": "INFO", "message":"evicted [0] entries from cache after reloading database [/tmp/elasticsearch-3976228296045036888/geoip-databases/qElhr0BPQnufjz5_CdDGMg/GeoLite2-Country.mmdb]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch-shutdown","log.logger":"org.elasticsearch.ingest.geoip.DatabaseReaderLazyLoader","elasticsearch.cluster.uuid":"Z_NFF7KcT3OJ6ekUUoJ3Qg","elasticsearch.node.id":"qElhr0BPQnufjz5_CdDGMg","elasticsearch.node.name":"dev-es-default-0","elasticsearch.cluster.name":"dev"}
{"@timestamp":"2024-03-04T16:23:06.577Z", "log.level": "INFO", "message":"evicted [0] entries from cache after reloading database [/tmp/elasticsearch-3976228296045036888/geoip-databases/qElhr0BPQnufjz5_CdDGMg/GeoLite2-ASN.mmdb]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch-shutdown","log.logger":"org.elasticsearch.ingest.geoip.DatabaseReaderLazyLoader","elasticsearch.cluster.uuid":"Z_NFF7KcT3OJ6ekUUoJ3Qg","elasticsearch.node.id":"qElhr0BPQnufjz5_CdDGMg","elasticsearch.node.name":"dev-es-default-0","elasticsearch.cluster.name":"dev"}
{"@timestamp":"2024-03-04T16:23:06.578Z", "log.level": "INFO", "message":"evicted [25] entries from cache after reloading database [/tmp/elasticsearch-3976228296045036888/geoip-databases/qElhr0BPQnufjz5_CdDGMg/GeoLite2-City.mmdb]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch-shutdown","log.logger":"org.elasticsearch.ingest.geoip.DatabaseReaderLazyLoader","elasticsearch.cluster.uuid":"Z_NFF7KcT3OJ6ekUUoJ3Qg","elasticsearch.node.id":"qElhr0BPQnufjz5_CdDGMg","elasticsearch.node.name":"dev-es-default-0","elasticsearch.cluster.name":"dev"}
{"@timestamp":"2024-03-04T16:23:06.586Z", "log.level": "INFO", "message":"closed", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch-shutdown","log.logger":"org.elasticsearch.node.Node","elasticsearch.cluster.uuid":"Z_NFF7KcT3OJ6ekUUoJ3Qg","elasticsearch.node.id":"qElhr0BPQnufjz5_CdDGMg","elasticsearch.node.name":"dev-es-default-0","elasticsearch.cluster.name":"dev"}

Mar 04 '24 16:03 elavol

@torredil do you think this quick and dirty fix will works 😆? Screenshot 2024-03-05 at 12 18 13 AM

Mar 04 '24 17:03 elavol

@torredil seemly k8s not call preStop, because if it call then atleast I will see log PreStop: executing PreStop lifecycle hook, but I didn't see it

Mar 04 '24 17:03 elavol

@torredil any good news for this 😀?

Mar 08 '24 02:03 elavol

Hi, sorry about the wait - your issue is probably caused by what #1969 solves - Karpenter changed the taints they used when draining nodes and our LCH needs to be changed to account for it. That fix should be available in the next release of the EBS CSI Driver, expected to happen later this month.

Mar 15 '24 15:03 ConnorJC3

Amazing, thank so much @ConnorJC3 😍 I will test again when new version released

Mar 18 '24 02:03 elavol

@ConnorJC3 I was trying to understand from which version of karpenter this was changed , i was about to upgrade my csi driver to address the fix but i guess i will wait for the next release.

I see https://github.com/kubernetes-sigs/karpenter/pull/508 was merged in October :thinking:

Mar 20 '24 07:03 primeroz

@primeroz here is list of karpenter versions affected

Mar 20 '24 11:03 elavol

v1.29.0 has been released which contains the fix.

Mar 22 '24 13:03 alexandermarston

I just upgraded my dev cluster to 1.29.0 while using karpenter-aws 0.35.2 but still have the same multi attach problem as before

Have stateful workload running with PVC Mounted
Delete a node with kubectl delete node
- karpenter intercepts through finalizer and sets taint

- effect: NoSchedule
  key: karpenter.sh/disruption
  value: disrupting

Pod is evicted - 26s Normal Evicted pod/c-pod-0 Evicted pod
new Node is created by karpenter and pod is moved to it
Multi-Attach error for volume "pvc-9b50cf31-5a24-4783-8039-a362ad2a7c0d" Volume is already exclusively attached to one node and can't be attached to another for 6+ minutes until timeout
pod starts on new node

Mar 22 '24 14:03 primeroz

@primeroz do you have the logs from the driver, when your node was disrupted? (also, would be useful for you to increase your log level while collecting these)

Mar 22 '24 14:03 alexandermarston

@alexandermarston there are no events whatsoever when i delete the node in the ebs csi pod running on that node

I can see the prestop is in the pod spec ... so i guess when we delete a node rather than replace it through karpenter disruption the prestop on the ebs node is never run ?

Note in my case i am deleting the node to get it replaced by karpenter.

+ ebs-csi-node-rj6qg › node-driver-registrar
ebs-csi-node-rj6qg node-driver-registrar I0322 14:33:43.381506       1 main.go:135] Version: v2.10.0
ebs-csi-node-rj6qg node-driver-registrar I0322 14:33:43.381558       1 main.go:136] Running node-driver-registrar in mode=
ebs-csi-node-rj6qg node-driver-registrar I0322 14:33:43.381565       1 main.go:157] Attempting to open a gRPC connection with: "/csi/csi.sock"
ebs-csi-node-rj6qg node-driver-registrar I0322 14:33:43.382280       1 main.go:164] Calling CSI driver to discover driver name
ebs-csi-node-rj6qg node-driver-registrar I0322 14:33:43.385954       1 main.go:173] CSI driver name: "ebs.csi.aws.com"
ebs-csi-node-rj6qg node-driver-registrar I0322 14:33:43.385996       1 node_register.go:55] Starting Registration Server at: /registration/ebs.csi.aws.com-reg.sock
ebs-csi-node-rj6qg node-driver-registrar I0322 14:33:43.386143       1 node_register.go:64] Registration Server started at: /registration/ebs.csi.aws.com-reg.sock
ebs-csi-node-rj6qg node-driver-registrar I0322 14:33:43.386255       1 node_register.go:88] Skipping HTTP server because endpoint is set to: ""
ebs-csi-node-rj6qg node-driver-registrar I0322 14:33:43.832022       1 main.go:90] Received GetInfo call: &InfoRequest{}
ebs-csi-node-rj6qg node-driver-registrar I0322 14:33:43.857979       1 main.go:101] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}
ebs-csi-node-rj6qg liveness-probe I0322 14:33:44.130483       1 main.go:133] "Calling CSI driver to discover driver name"
ebs-csi-node-rj6qg ebs-plugin {"ts":1711118022623.64,"caller":"driver/driver.go:84","msg":"Driver Information","v":0,"Driver":"ebs.csi.aws.com","Version":"v1.29.0"}
ebs-csi-node-rj6qg ebs-plugin {"ts":1711118022623.6697,"caller":"driver/node.go:97","msg":"regionFromSession Node service","v":0,"region":""}
ebs-csi-node-rj6qg ebs-plugin {"ts":1711118022623.6895,"caller":"cloud/metadata.go:85","msg":"retrieving instance data from ec2 metadata","v":0}
ebs-csi-node-rj6qg liveness-probe I0322 14:33:44.133455       1 main.go:141] "CSI driver name" driver="ebs.csi.aws.com"
ebs-csi-node-rj6qg liveness-probe I0322 14:33:44.133475       1 main.go:170] "ServeMux listening" address="0.0.0.0:9808"
ebs-csi-node-rj6qg ebs-plugin {"ts":1711118022625.4724,"caller":"cloud/metadata.go:92","msg":"ec2 metadata is available","v":0}
ebs-csi-node-rj6qg ebs-plugin {"ts":1711118022626.2078,"caller":"cloud/metadata_ec2.go:25","msg":"Retrieving EC2 instance identity metadata","v":0,"regionFromSession":""}
ebs-csi-node-rj6qg ebs-plugin {"ts":1711118023644.7207,"caller":"driver/node.go:875","msg":"Unexpected failure when attempting to remove node taint(s)","err":"isAllocatableSet: driver not found on node ip-10-1-5-195.us-west-2.compute.internal"}
ebs-csi-node-rj6qg ebs-plugin {"ts":1711118024155.2048,"caller":"driver/node.go:961","msg":"CSINode Allocatable value is set","v":0,"nodeName":"ip-10-1-5-195.us-west-2.compute.internal","count":25}
ebs-csi-node-rj6qg ebs-plugin {"ts":1711118326119.883,"caller":"[email protected]/mount_linux.go:243","msg":"Detected OS without systemd","v":2}



### Delete node here 






- ebs-csi-node-rj6qg › ebs-plugin
- ebs-csi-node-rj6qg › liveness-probe
+ ebs-csi-node-rj6qg › liveness-probe
- ebs-csi-node-rj6qg › node-driver-registrar
+ ebs-csi-node-rj6qg › ebs-plugin
+ ebs-csi-node-rj6qg › node-driver-registrar
- ebs-csi-node-rj6qg › ebs-plugin
- ebs-csi-node-rj6qg › node-driver-registrar
- ebs-csi-node-rj6qg › liveness-probe

Mar 22 '24 14:03 primeroz

@primeroz what exactly do you mean by "delete a node"? Assuming you mean something like deleting the node via the AWS Console - that sounds like ungraceful termination, which will not run pre-stop hooks as it results in an immediate and uninterruptible shutdown.

Mar 22 '24 14:03 ConnorJC3

@ConnorJC3 I mean kubectl delete node XXX which is something karpenter supports and manages through finalizer, it goes through the same steps as a karpenter managed disruption - see https://karpenter.sh/docs/getting-started/getting-started-with-karpenter/#8-delete-karpenter-nodes-manually

Anyway i am testing now with a karpenter disruption by adding an annotation to nodepool, which will trigger a full replacement of all nodes created by that nodepool since the specs have changed. same multi-attach error happens

collecting logs on next node termination

Mar 22 '24 14:03 primeroz

OK, I've been able to test this by manually tainting a node and then running the preStop hook with:

k exec ebs-csi-node-l7zzf -- "/bin/aws-ebs-csi-driver" "pre-stop-hook"

The preStop hook is doing everything it should, from what I understand.

Again, from my limited understanding, I imagine the issue is that your service (which is using the volume) is taking longer than the terminationGracePeriodSeconds of the EBS CSI driver to shutdown and release the volume. If your service takes longer than 30 seconds to terminate, then the EBS driver will never have a chance to do its work.

You could either try increasing the terminationGracePeriod of the EBS CSI Driver or lower the grace period of your service.

Mar 22 '24 15:03 alexandermarston

If your service takes longer than 30 seconds to terminate, then the EBS driver will never have a chance to do its work.

You could either try increasing the terminationGracePeriod of the EBS CSI Driver or lower the grace period of your service.

This is entirely possible, i am testing indeed with a service with an higher than normal grace period

I will test this :pray:

Mar 22 '24 15:03 primeroz

@alexandermarston

Increasing the terminationGracePeriod: 3700 on the ebs node daemonset did not help . triggered an update of nodes in ndoepool by updating an annotation but still failing with multi-attach errors

Also what's interesting, to me at least, is that in the logs of the ebs csi node i never see klog.InfoS("PreStop: executing PreStop lifecycle hook") so i guess that when karpenter replaces a node it does never trigger a termination of the daemonsets running on it ?

I know for sure that karpenter does not drain the node in the same way that kubectl drain does. I can see that the non daemonset pods are being evicted one by one but I see nothing on the ebs nodes

I have been watching the ebs-csi-node pod , i have been tailing events for both the ebs-csi-node pod ... nothing. it seems like once the non daemonsets pods are all evicted the node just goes away and so all the daemonsets pods on it :thinking:

Mar 22 '24 15:03 primeroz

You won't see the logs from the preStop hook handler, as they are not logged in the same way normal pod stdout is. You will be able to see if the preStop hook failed in the Kubernetes Events though.

Can you try deleting the node again, but manually execute the preStop hook and share the output?

k exec <EBS-CSI-POD-NAME> -- "/bin/aws-ebs-csi-driver" "pre-stop-hook"

Mar 22 '24 15:03 alexandermarston

You won't see the logs from the preStop hook handler, as they are not logged in the same way normal pod stdout is.

To be specific, Kubernetes logs output from pre-stop hooks to the kubelet log - so you can look there to determine if the hook is running (although this can be difficult on a terminated node if you aren't streaming the logs anywhere).

@alexandermarston 's suggestion above is a good idea to check that the hook is able to run and connect to the Kubernetes API server on your cluster.

Mar 22 '24 16:03 ConnorJC3

@ConnorJC3 @alexandermarston thanks for the input. i thought all prestop logs would go to events that's why i was watching them

I will do both of this things, given i have only 3 nodes with 3 pods i will

on the first one run k exec <EBS-CSI-POD-NAME> -- "/bin/aws-ebs-csi-driver" "pre-stop-hook" as soon as the statefulset pod for my workload on that node goes into terminating state
on the other 2 i will tail the kubelet logs to check what happens

and report back here

Mar 22 '24 16:03 primeroz

aws-ebs-csi-driver aws-ebs-csi-driver copied to clipboard

Volume still hang on Karpenter Node Consolidation/Termination

aws-ebs-csi-driver
aws-ebs-csi-driver copied to clipboard