aws-ebs-csi-driver
aws-ebs-csi-driver copied to clipboard
Volume still hang on Karpenter Node Consolidation/Termination
/kind bug
What happened? As discussed at https://github.com/kubernetes-sigs/aws-ebs-csi-driver/issues/1665, @torredil said it's fixed in v1.27 (https://github.com/kubernetes-sigs/aws-ebs-csi-driver/issues/1665#issuecomment-1898607940) but we still got problem with v1.28
- The pod using volume pv-A running in node N1
- Karpenter terminate pod and terminate node N1
- K8s start new pod and trying attach volume pv-A but still need to wait 6 minutes to be release and attach to new Pod
What you expected to happen?
- After old pod has been terminated, the pv-A should be released and able to attach to new pod
How to reproduce it (as minimally and precisely as possible)?
- Im using https://www.elastic.co/guide/en/cloud-on-k8s/current/index.html to setup cluster with 3 nodes
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: dev
spec:
version: 8.12.2
volumeClaimDeletePolicy: DeleteOnScaledownAndClusterDeletion
updateStrategy:
changeBudget:
maxSurge: 2
maxUnavailable: 1
nodeSets:
- name: default
count: 3
volumeClaimTemplates:
- metadata:
name: elasticsearch-data # Do not change this name unless you set up a volume mount for the data path.
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 25Gi
podTemplate:
spec:
nodeSelector:
kubernetes.io/arch: arm64
topology.kubernetes.io/zone: eu-central-1a
containers:
- name: elasticsearch
env:
- name: ES_JAVA_OPTS
value: -Xms4g -Xmx4g
resources:
requests:
memory: 5Gi
cpu: 1
limits:
memory: 5Gi
cpu: 2
config:
node.store.allow_mmap: false
- Trigger spot instance termination or just delete 1 ec2 instance
- The node has been removed in k8s very quick, old pod has been Terminated and k8s start new pod
- Pod stuck in 6 minutes with error
Multi-Attach error for volume "pvc-xxxxx-xxxxx-xxx" Volume is already exclusively attached to one node and can't be attached to another - After 6 minutes new pod can attach volume
- Here is logs of
ebs-csi-controller
I0302 06:12:10.305080 1 controller.go:430] "ControllerPublishVolume: attached" volumeID="vol-02b33186429105461" nodeID="i-0715ec90e486bb8a1" devicePath="/dev/xvdaa"
<< at 06:14 the node has been terminated but no logs here >>
I0302 06:20:18.486042 1 controller.go:471] "ControllerUnpublishVolume: detaching" volumeID="vol-02b33186429105461" nodeID="i-0715ec90e486bb8a1"
I0302 06:20:18.584737 1 cloud.go:792] "DetachDisk: called on non-attached volume" volumeID="vol-02b33186429105461"
I0302 06:20:18.807752 1 controller.go:474] "ControllerUnpublishVolume: attachment not found" volumeID="vol-02b33186429105461" nodeID="i-0715ec90e486bb8a1"
I0302 06:20:19.124534 1 controller.go:421] "ControllerPublishVolume: attaching" volumeID="vol-02b33186429105461" nodeID="i-0ee2a470112401ffb"
I0302 06:20:20.635493 1 controller.go:430] "ControllerPublishVolume: attached" volumeID="vol-02b33186429105461" nodeID="i-0ee2a470112401ffb" devicePath="/dev/xvdaa"
Anything else we need to know?: I setup csi driver using eks add-on Environment
- Kubernetes version (use
kubectl version):
Client Version: v1.29.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.0-eks-c417bb3
- Driver version:
v1.28.0-eksbuild.1
I updated to karpenter 0.35 and using AL2023 image, the problem still happen
some update: it seemly caused by elasticsearch statefulset, time to terminate es pod is so slow, it's possible that the driver pod be killed before es pod killed then the volume not released
update: when I set PRE_STOP_ADDITIONAL_WAIT_SECONDS to 5 seconds, new pod can attach pvc normally, so I think the long delay of pod stop is reason for volume stuck
update: when I set
PRE_STOP_ADDITIONAL_WAIT_SECONDSto 5 seconds, new pod can attach pvc normally, so I think the long delay of pod stop is reason for volume stuck
Nope, in another try pv still stuck 🤦
Hey there @levanlongktmt thank you for reporting this.
It appears your spot instances are being ungracefully terminated. Take a look at the documentation for guidance on navigating this issue: 6-Minute Delays in Attaching Volumes - What steps can be taken to mitigate this issue?
If you are still running into delays but have already configured the Kubelet and enabled interruption handling in Karpenter, please let us know. Thanks!
@torredil ops... my user data missed last line
systemctl restart kubelet
let me try with it
hey @torredil I tried but still no luck :(
I already added SQS to handle interrupt, so when I trigger interupt in aws console, karpenter handle and launch new instance very quickly. I also follow logs of ebs-csi-node but it not show anything related to prestop hook, here is all logs of it
I0304 15:40:38.249081 1 driver.go:83] "Driver Information" Driver="ebs.csi.aws.com" Version="v1.28.0"
I0304 15:40:38.250134 1 node.go:93] "regionFromSession Node service" region=""
I0304 15:40:38.252657 1 metadata.go:85] "retrieving instance data from ec2 metadata"
I0304 15:40:38.257835 1 metadata.go:92] "ec2 metadata is available"
I0304 15:40:38.258488 1 metadata_ec2.go:25] "Retrieving EC2 instance identity metadata" regionFromSession=""
I0304 15:47:29.651847 1 mount_linux.go:243] Detected OS without systemd
after graceful shutdown of elasticsearch pod, the pod ebs-csi-node has been killed immediately. So the prestop has been ignored.
I was thought it might because bug of AL2023, so I switch back to use AL2 but still no luck.
I will try set InhibitDelayMaxSec to 60 (45 + 15)
Still no luck, volume still be stuck in 6 minutes and prestop hook not working 😭
@levanlongktmt Thanks for the followup info. Something that came to mind here is that it might be possible for newer versions of Karpenter to be incompatible with the pre-stop lifecycle hook due to this change: https://github.com/kubernetes-sigs/karpenter/pull/508.
You'll notice that the LCH won't run if the following taint is not present on the node during a termination event:
node.kubernetes.io/unschedulable
We'll take a closer look and report back with more details or a fix. Thank you!
@torredil here is logs of karpenter and elasticsearch pod, so as I see
- At 16:22:20 karpenter get interupt message
- Very quickly karpenter call taint node and pod elasticsearch get shutdown signal
- After 46 seconds the pod elasticsearch terminated
- At 16:23:09 karpenter start delete node and the node has been deleted very quick
Logs of Karpenter
{"level":"INFO","time":"2024-03-04T16:22:20.270Z","logger":"controller.interruption","message":"initiating delete from interruption message","commit":"2c8f2a5","queue":"Karpenter-kollekt-eks","messageKind":"SpotInterruptionKind","nodeclaim":"default-2bd27","action":"CordonAndDrain","node":"i-0502b920c79f40b5a.eu-central-1.compute.internal"}
{"level":"INFO","time":"2024-03-04T16:22:20.336Z","logger":"controller.node.termination","message":"tainted node","commit":"2c8f2a5","node":"i-0502b920c79f40b5a.eu-central-1.compute.internal"}
{"level":"INFO","time":"2024-03-04T16:22:21.364Z","logger":"controller.provisioner","message":"found provisionable pod(s)","commit":"2c8f2a5","pods":"datalayer/dev-es-default-0","duration":"24.462475ms"}
{"level":"INFO","time":"2024-03-04T16:22:21.364Z","logger":"controller.provisioner","message":"computed new nodeclaim(s) to fit pod(s)","commit":"2c8f2a5","nodeclaims":1,"pods":1}
{"level":"INFO","time":"2024-03-04T16:22:21.379Z","logger":"controller.provisioner","message":"created nodeclaim","commit":"2c8f2a5","nodepool":"default","nodeclaim":"default-pk44s","requests":{"cpu":"680m","memory":"5240Mi","pods":"5"},"instance-types":"a1.2xlarge, a1.4xlarge, a1.metal, a1.xlarge, c6g.12xlarge and 91 other(s)"}
{"level":"INFO","time":"2024-03-04T16:22:23.782Z","logger":"controller.nodeclaim.lifecycle","message":"launched nodeclaim","commit":"2c8f2a5","nodeclaim":"default-pk44s","provider-id":"aws:///eu-central-1a/i-09a52eea14ff0b6c9","instance-type":"r6gd.medium","zone":"eu-central-1a","capacity-type":"spot","allocatable":{"cpu":"940m","ephemeral-storage":"17Gi","memory":"7075Mi","pods":"8","vpc.amazonaws.com/pod-eni":"4"}}
{"level":"INFO","time":"2024-03-04T16:22:31.356Z","logger":"controller.provisioner","message":"found provisionable pod(s)","commit":"2c8f2a5","pods":"datalayer/dev-es-default-0","duration":"16.586479ms"}
{"level":"INFO","time":"2024-03-04T16:22:41.353Z","logger":"controller.provisioner","message":"found provisionable pod(s)","commit":"2c8f2a5","pods":"datalayer/dev-es-default-0","duration":"13.509969ms"}
{"level":"INFO","time":"2024-03-04T16:22:51.355Z","logger":"controller.provisioner","message":"found provisionable pod(s)","commit":"2c8f2a5","pods":"datalayer/dev-es-default-0","duration":"14.534034ms"}
{"level":"INFO","time":"2024-03-04T16:22:58.280Z","logger":"controller.nodeclaim.lifecycle","message":"registered nodeclaim","commit":"2c8f2a5","nodeclaim":"default-pk44s","provider-id":"aws:///eu-central-1a/i-09a52eea14ff0b6c9","node":"i-09a52eea14ff0b6c9.eu-central-1.compute.internal"}
{"level":"INFO","time":"2024-03-04T16:23:01.356Z","logger":"controller.provisioner","message":"found provisionable pod(s)","commit":"2c8f2a5","pods":"datalayer/dev-es-default-0","duration":"13.781092ms"}
{"level":"INFO","time":"2024-03-04T16:23:09.571Z","logger":"controller.node.termination","message":"deleted node","commit":"2c8f2a5","node":"i-0502b920c79f40b5a.eu-central-1.compute.internal"}
{"level":"INFO","time":"2024-03-04T16:23:09.944Z","logger":"controller.nodeclaim.termination","message":"deleted nodeclaim","commit":"2c8f2a5","nodeclaim":"default-2bd27","node":"i-0502b920c79f40b5a.eu-central-1.compute.internal","provider-id":"aws:///eu-central-1a/i-0502b920c79f40b5a"}
{"level":"INFO","time":"2024-03-04T16:23:10.455Z","logger":"controller.disruption","message":"pod \"datalayer/dev-es-default-0\" has a preferred Anti-Affinity which can prevent consolidation","commit":"2c8f2a5"}
{"level":"INFO","time":"2024-03-04T16:23:10.698Z","logger":"controller.provisioner","message":"found provisionable pod(s)","commit":"2c8f2a5","pods":"datalayer/dev-es-default-0","duration":"83.215948ms"}
{"level":"INFO","time":"2024-03-04T16:23:10.731Z","logger":"controller.nodeclaim.lifecycle","message":"initialized nodeclaim","commit":"2c8f2a5","nodeclaim":"default-pk44s","provider-id":"aws:///eu-central-1a/i-09a52eea14ff0b6c9","node":"i-09a52eea14ff0b6c9.eu-central-1.compute.internal","allocatable":{"cpu":"940m","ephemeral-storage":"18233774458","hugepages-1Gi":"0","hugepages-2Mi":"0","hugepages-32Mi":"0","hugepages-64Ki":"0","memory":"7552796Ki","pods":"8"}}
{"level":"INFO","time":"2024-03-04T16:23:12.487Z","logger":"controller.provisioner","message":"found provisionable pod(s)","commit":"2c8f2a5","pods":"datalayer/dev-es-default-0","duration":"12.009019ms"}
Logs of elasticsearch pod
{"@timestamp": "2024-03-04T16:22:20+00:00", "message": "retrieving node ID", "ecs.version": "1.2.0", "event.dataset": "elasticsearch.pre-stop-hook"}
{"@timestamp": "2024-03-04T16:22:21+00:00", "message": "initiating node shutdown", "ecs.version": "1.2.0", "event.dataset": "elasticsearch.pre-stop-hook"}
{"@timestamp":"2024-03-04T16:22:21.338Z", "log.level": "INFO", "message":"creating shutdown record {nodeId=[qElhr0BPQnufjz5_CdDGMg], type=[RESTART], reason=[pre-stop hook]}", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[dev-es-default-0][masterService#updateTask][T#2]","log.logger":"org.elasticsearch.xpack.shutdown.TransportPutShutdownNodeAction","elasticsearch.cluster.uuid":"Z_NFF7KcT3OJ6ekUUoJ3Qg","elasticsearch.node.id":"qElhr0BPQnufjz5_CdDGMg","elasticsearch.node.name":"dev-es-default-0","elasticsearch.cluster.name":"dev"}
{"@timestamp":"2024-03-04T16:22:21.530Z", "log.level": "INFO", "message":"Aborting health node task due to node [{dev-es-default-0}{qElhr0BPQnufjz5_CdDGMg}] shutting down.", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[dev-es-default-0][clusterApplierService#updateTask][T#1]","log.logger":"org.elasticsearch.health.node.selection.HealthNodeTaskExecutor","elasticsearch.cluster.uuid":"Z_NFF7KcT3OJ6ekUUoJ3Qg","elasticsearch.node.id":"qElhr0BPQnufjz5_CdDGMg","elasticsearch.node.name":"dev-es-default-0","elasticsearch.cluster.name":"dev"}
{"@timestamp":"2024-03-04T16:22:21.535Z", "log.level": "INFO", "message":"Starting node shutdown sequence for ML", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[dev-es-default-0][clusterApplierService#updateTask][T#1]","log.logger":"org.elasticsearch.xpack.ml.MlLifeCycleService","elasticsearch.cluster.uuid":"Z_NFF7KcT3OJ6ekUUoJ3Qg","elasticsearch.node.id":"qElhr0BPQnufjz5_CdDGMg","elasticsearch.node.name":"dev-es-default-0","elasticsearch.cluster.name":"dev"}
{"@timestamp": "2024-03-04T16:22:21+00:00", "message": "waiting for node shutdown to complete", "ecs.version": "1.2.0", "event.dataset": "elasticsearch.pre-stop-hook"}
{"@timestamp": "2024-03-04T16:22:21+00:00", "message": "delaying termination for 44 seconds", "ecs.version": "1.2.0", "event.dataset": "elasticsearch.pre-stop-hook"}
{"@timestamp":"2024-03-04T16:23:05.782Z", "log.level": "INFO", "message":"stopping ...", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch-shutdown","log.logger":"org.elasticsearch.node.Node","elasticsearch.cluster.uuid":"Z_NFF7KcT3OJ6ekUUoJ3Qg","elasticsearch.node.id":"qElhr0BPQnufjz5_CdDGMg","elasticsearch.node.name":"dev-es-default-0","elasticsearch.cluster.name":"dev"}
{"@timestamp":"2024-03-04T16:23:05.783Z", "log.level": "INFO", "message":"shutting down watcher thread", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[file-watcher[/usr/share/elasticsearch/config/operator/settings.json]]","log.logger":"org.elasticsearch.common.file.AbstractFileWatchingService","elasticsearch.cluster.uuid":"Z_NFF7KcT3OJ6ekUUoJ3Qg","elasticsearch.node.id":"qElhr0BPQnufjz5_CdDGMg","elasticsearch.node.name":"dev-es-default-0","elasticsearch.cluster.name":"dev"}
{"@timestamp":"2024-03-04T16:23:05.785Z", "log.level": "INFO", "message":"watcher service stopped", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch-shutdown","log.logger":"org.elasticsearch.common.file.AbstractFileWatchingService","elasticsearch.cluster.uuid":"Z_NFF7KcT3OJ6ekUUoJ3Qg","elasticsearch.node.id":"qElhr0BPQnufjz5_CdDGMg","elasticsearch.node.name":"dev-es-default-0","elasticsearch.cluster.name":"dev"}
{"@timestamp":"2024-03-04T16:23:05.790Z", "log.level": "INFO", "message":"[controller/83] [Main.cc@176] ML controller exiting", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"ml-cpp-log-tail-thread","log.logger":"org.elasticsearch.xpack.ml.process.logging.CppLogMessageHandler","elasticsearch.cluster.uuid":"Z_NFF7KcT3OJ6ekUUoJ3Qg","elasticsearch.node.id":"qElhr0BPQnufjz5_CdDGMg","elasticsearch.node.name":"dev-es-default-0","elasticsearch.cluster.name":"dev"}
{"@timestamp":"2024-03-04T16:23:05.792Z", "log.level": "INFO", "message":"Native controller process has stopped - no new native processes can be started", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"ml-cpp-log-tail-thread","log.logger":"org.elasticsearch.xpack.ml.process.NativeController","elasticsearch.cluster.uuid":"Z_NFF7KcT3OJ6ekUUoJ3Qg","elasticsearch.node.id":"qElhr0BPQnufjz5_CdDGMg","elasticsearch.node.name":"dev-es-default-0","elasticsearch.cluster.name":"dev"}
{"@timestamp":"2024-03-04T16:23:05.793Z", "log.level": "INFO", "message":"stopping watch service, reason [shutdown initiated]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch-shutdown","log.logger":"org.elasticsearch.xpack.watcher.WatcherService","elasticsearch.cluster.uuid":"Z_NFF7KcT3OJ6ekUUoJ3Qg","elasticsearch.node.id":"qElhr0BPQnufjz5_CdDGMg","elasticsearch.node.name":"dev-es-default-0","elasticsearch.cluster.name":"dev"}
{"@timestamp":"2024-03-04T16:23:05.794Z", "log.level": "INFO", "message":"watcher has stopped and shutdown", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[dev-es-default-0][generic][T#3]","log.logger":"org.elasticsearch.xpack.watcher.WatcherLifeCycleService","elasticsearch.cluster.uuid":"Z_NFF7KcT3OJ6ekUUoJ3Qg","elasticsearch.node.id":"qElhr0BPQnufjz5_CdDGMg","elasticsearch.node.name":"dev-es-default-0","elasticsearch.cluster.name":"dev"}
{"timestamp": "2024-03-04T16:23:06+00:00", "message": "readiness probe failed", "curl_rc": "7"}
{"@timestamp":"2024-03-04T16:23:06.552Z", "log.level": "INFO", "message":"stopped", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch-shutdown","log.logger":"org.elasticsearch.node.Node","elasticsearch.cluster.uuid":"Z_NFF7KcT3OJ6ekUUoJ3Qg","elasticsearch.node.id":"qElhr0BPQnufjz5_CdDGMg","elasticsearch.node.name":"dev-es-default-0","elasticsearch.cluster.name":"dev"}
{"@timestamp":"2024-03-04T16:23:06.554Z", "log.level": "INFO", "message":"closing ...", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch-shutdown","log.logger":"org.elasticsearch.node.Node","elasticsearch.cluster.uuid":"Z_NFF7KcT3OJ6ekUUoJ3Qg","elasticsearch.node.id":"qElhr0BPQnufjz5_CdDGMg","elasticsearch.node.name":"dev-es-default-0","elasticsearch.cluster.name":"dev"}
{"@timestamp":"2024-03-04T16:23:06.576Z", "log.level": "INFO", "message":"evicted [0] entries from cache after reloading database [/tmp/elasticsearch-3976228296045036888/geoip-databases/qElhr0BPQnufjz5_CdDGMg/GeoLite2-Country.mmdb]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch-shutdown","log.logger":"org.elasticsearch.ingest.geoip.DatabaseReaderLazyLoader","elasticsearch.cluster.uuid":"Z_NFF7KcT3OJ6ekUUoJ3Qg","elasticsearch.node.id":"qElhr0BPQnufjz5_CdDGMg","elasticsearch.node.name":"dev-es-default-0","elasticsearch.cluster.name":"dev"}
{"@timestamp":"2024-03-04T16:23:06.577Z", "log.level": "INFO", "message":"evicted [0] entries from cache after reloading database [/tmp/elasticsearch-3976228296045036888/geoip-databases/qElhr0BPQnufjz5_CdDGMg/GeoLite2-ASN.mmdb]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch-shutdown","log.logger":"org.elasticsearch.ingest.geoip.DatabaseReaderLazyLoader","elasticsearch.cluster.uuid":"Z_NFF7KcT3OJ6ekUUoJ3Qg","elasticsearch.node.id":"qElhr0BPQnufjz5_CdDGMg","elasticsearch.node.name":"dev-es-default-0","elasticsearch.cluster.name":"dev"}
{"@timestamp":"2024-03-04T16:23:06.578Z", "log.level": "INFO", "message":"evicted [25] entries from cache after reloading database [/tmp/elasticsearch-3976228296045036888/geoip-databases/qElhr0BPQnufjz5_CdDGMg/GeoLite2-City.mmdb]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch-shutdown","log.logger":"org.elasticsearch.ingest.geoip.DatabaseReaderLazyLoader","elasticsearch.cluster.uuid":"Z_NFF7KcT3OJ6ekUUoJ3Qg","elasticsearch.node.id":"qElhr0BPQnufjz5_CdDGMg","elasticsearch.node.name":"dev-es-default-0","elasticsearch.cluster.name":"dev"}
{"@timestamp":"2024-03-04T16:23:06.586Z", "log.level": "INFO", "message":"closed", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch-shutdown","log.logger":"org.elasticsearch.node.Node","elasticsearch.cluster.uuid":"Z_NFF7KcT3OJ6ekUUoJ3Qg","elasticsearch.node.id":"qElhr0BPQnufjz5_CdDGMg","elasticsearch.node.name":"dev-es-default-0","elasticsearch.cluster.name":"dev"}
@torredil do you think this quick and dirty fix will works 😆?
@torredil seemly k8s not call preStop, because if it call then atleast I will see log PreStop: executing PreStop lifecycle hook, but I didn't see it
@torredil any good news for this 😀?
Hi, sorry about the wait - your issue is probably caused by what #1969 solves - Karpenter changed the taints they used when draining nodes and our LCH needs to be changed to account for it. That fix should be available in the next release of the EBS CSI Driver, expected to happen later this month.
Amazing, thank so much @ConnorJC3 😍 I will test again when new version released
@ConnorJC3 I was trying to understand from which version of karpenter this was changed , i was about to upgrade my csi driver to address the fix but i guess i will wait for the next release.
I see https://github.com/kubernetes-sigs/karpenter/pull/508 was merged in October :thinking:
@primeroz here is list of karpenter versions affected
v1.29.0 has been released which contains the fix.
I just upgraded my dev cluster to 1.29.0 while using karpenter-aws 0.35.2 but still have the same multi attach problem as before
- Have stateful workload running with PVC Mounted
- Delete a node with
kubectl delete node- karpenter intercepts through finalizer and sets
taint
- karpenter intercepts through finalizer and sets
- effect: NoSchedule
key: karpenter.sh/disruption
value: disrupting
- Pod is evicted -
26s Normal Evicted pod/c-pod-0 Evicted pod - new Node is created by karpenter and pod is moved to it
Multi-Attach error for volume "pvc-9b50cf31-5a24-4783-8039-a362ad2a7c0d" Volume is already exclusively attached to one node and can't be attached to anotherfor 6+ minutes until timeout- pod starts on new node
@primeroz do you have the logs from the driver, when your node was disrupted? (also, would be useful for you to increase your log level while collecting these)
@alexandermarston there are no events whatsoever when i delete the node in the ebs csi pod running on that node
I can see the prestop is in the pod spec ... so i guess when we delete a node rather than replace it through karpenter disruption the prestop on the ebs node is never run ?
Note in my case i am deleting the node to get it replaced by karpenter.
+ ebs-csi-node-rj6qg › node-driver-registrar
ebs-csi-node-rj6qg node-driver-registrar I0322 14:33:43.381506 1 main.go:135] Version: v2.10.0
ebs-csi-node-rj6qg node-driver-registrar I0322 14:33:43.381558 1 main.go:136] Running node-driver-registrar in mode=
ebs-csi-node-rj6qg node-driver-registrar I0322 14:33:43.381565 1 main.go:157] Attempting to open a gRPC connection with: "/csi/csi.sock"
ebs-csi-node-rj6qg node-driver-registrar I0322 14:33:43.382280 1 main.go:164] Calling CSI driver to discover driver name
ebs-csi-node-rj6qg node-driver-registrar I0322 14:33:43.385954 1 main.go:173] CSI driver name: "ebs.csi.aws.com"
ebs-csi-node-rj6qg node-driver-registrar I0322 14:33:43.385996 1 node_register.go:55] Starting Registration Server at: /registration/ebs.csi.aws.com-reg.sock
ebs-csi-node-rj6qg node-driver-registrar I0322 14:33:43.386143 1 node_register.go:64] Registration Server started at: /registration/ebs.csi.aws.com-reg.sock
ebs-csi-node-rj6qg node-driver-registrar I0322 14:33:43.386255 1 node_register.go:88] Skipping HTTP server because endpoint is set to: ""
ebs-csi-node-rj6qg node-driver-registrar I0322 14:33:43.832022 1 main.go:90] Received GetInfo call: &InfoRequest{}
ebs-csi-node-rj6qg node-driver-registrar I0322 14:33:43.857979 1 main.go:101] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}
ebs-csi-node-rj6qg liveness-probe I0322 14:33:44.130483 1 main.go:133] "Calling CSI driver to discover driver name"
ebs-csi-node-rj6qg ebs-plugin {"ts":1711118022623.64,"caller":"driver/driver.go:84","msg":"Driver Information","v":0,"Driver":"ebs.csi.aws.com","Version":"v1.29.0"}
ebs-csi-node-rj6qg ebs-plugin {"ts":1711118022623.6697,"caller":"driver/node.go:97","msg":"regionFromSession Node service","v":0,"region":""}
ebs-csi-node-rj6qg ebs-plugin {"ts":1711118022623.6895,"caller":"cloud/metadata.go:85","msg":"retrieving instance data from ec2 metadata","v":0}
ebs-csi-node-rj6qg liveness-probe I0322 14:33:44.133455 1 main.go:141] "CSI driver name" driver="ebs.csi.aws.com"
ebs-csi-node-rj6qg liveness-probe I0322 14:33:44.133475 1 main.go:170] "ServeMux listening" address="0.0.0.0:9808"
ebs-csi-node-rj6qg ebs-plugin {"ts":1711118022625.4724,"caller":"cloud/metadata.go:92","msg":"ec2 metadata is available","v":0}
ebs-csi-node-rj6qg ebs-plugin {"ts":1711118022626.2078,"caller":"cloud/metadata_ec2.go:25","msg":"Retrieving EC2 instance identity metadata","v":0,"regionFromSession":""}
ebs-csi-node-rj6qg ebs-plugin {"ts":1711118023644.7207,"caller":"driver/node.go:875","msg":"Unexpected failure when attempting to remove node taint(s)","err":"isAllocatableSet: driver not found on node ip-10-1-5-195.us-west-2.compute.internal"}
ebs-csi-node-rj6qg ebs-plugin {"ts":1711118024155.2048,"caller":"driver/node.go:961","msg":"CSINode Allocatable value is set","v":0,"nodeName":"ip-10-1-5-195.us-west-2.compute.internal","count":25}
ebs-csi-node-rj6qg ebs-plugin {"ts":1711118326119.883,"caller":"[email protected]/mount_linux.go:243","msg":"Detected OS without systemd","v":2}
### Delete node here
- ebs-csi-node-rj6qg › ebs-plugin
- ebs-csi-node-rj6qg › liveness-probe
+ ebs-csi-node-rj6qg › liveness-probe
- ebs-csi-node-rj6qg › node-driver-registrar
+ ebs-csi-node-rj6qg › ebs-plugin
+ ebs-csi-node-rj6qg › node-driver-registrar
- ebs-csi-node-rj6qg › ebs-plugin
- ebs-csi-node-rj6qg › node-driver-registrar
- ebs-csi-node-rj6qg › liveness-probe
@primeroz what exactly do you mean by "delete a node"? Assuming you mean something like deleting the node via the AWS Console - that sounds like ungraceful termination, which will not run pre-stop hooks as it results in an immediate and uninterruptible shutdown.
@ConnorJC3 I mean kubectl delete node XXX which is something karpenter supports and manages through finalizer, it goes through the same steps as a karpenter managed disruption - see https://karpenter.sh/docs/getting-started/getting-started-with-karpenter/#8-delete-karpenter-nodes-manually
Anyway i am testing now with a karpenter disruption by adding an annotation to nodepool, which will trigger a full replacement of all nodes created by that nodepool since the specs have changed. same multi-attach error happens
collecting logs on next node termination
OK, I've been able to test this by manually tainting a node and then running the preStop hook with:
k exec ebs-csi-node-l7zzf -- "/bin/aws-ebs-csi-driver" "pre-stop-hook"
The preStop hook is doing everything it should, from what I understand.
Again, from my limited understanding, I imagine the issue is that your service (which is using the volume) is taking longer than the terminationGracePeriodSeconds of the EBS CSI driver to shutdown and release the volume. If your service takes longer than 30 seconds to terminate, then the EBS driver will never have a chance to do its work.
You could either try increasing the terminationGracePeriod of the EBS CSI Driver or lower the grace period of your service.
If your service takes longer than 30 seconds to terminate, then the EBS driver will never have a chance to do its work.
You could either try increasing the terminationGracePeriod of the EBS CSI Driver or lower the grace period of your service.
This is entirely possible, i am testing indeed with a service with an higher than normal grace period
I will test this :pray:
@alexandermarston
Increasing the terminationGracePeriod: 3700 on the ebs node daemonset did not help . triggered an update of nodes in ndoepool by updating an annotation but still failing with multi-attach errors
Also what's interesting, to me at least, is that in the logs of the ebs csi node i never see klog.InfoS("PreStop: executing PreStop lifecycle hook") so i guess that when karpenter replaces a node it does never trigger a termination of the daemonsets running on it ?
I know for sure that karpenter does not drain the node in the same way that kubectl drain does. I can see that the non daemonset pods are being evicted one by one but I see nothing on the ebs nodes
I have been watching the ebs-csi-node pod , i have been tailing events for both the ebs-csi-node pod ... nothing. it seems like once the non daemonsets pods are all evicted the node just goes away and so all the daemonsets pods on it :thinking:
You won't see the logs from the preStop hook handler, as they are not logged in the same way normal pod stdout is. You will be able to see if the preStop hook failed in the Kubernetes Events though.
Can you try deleting the node again, but manually execute the preStop hook and share the output?
k exec <EBS-CSI-POD-NAME> -- "/bin/aws-ebs-csi-driver" "pre-stop-hook"
You won't see the logs from the preStop hook handler, as they are not logged in the same way normal pod stdout is.
To be specific, Kubernetes logs output from pre-stop hooks to the kubelet log - so you can look there to determine if the hook is running (although this can be difficult on a terminated node if you aren't streaming the logs anywhere).
@alexandermarston 's suggestion above is a good idea to check that the hook is able to run and connect to the Kubernetes API server on your cluster.
@ConnorJC3 @alexandermarston thanks for the input. i thought all prestop logs would go to events that's why i was watching them
I will do both of this things, given i have only 3 nodes with 3 pods i will
- on the first one run
k exec <EBS-CSI-POD-NAME> -- "/bin/aws-ebs-csi-driver" "pre-stop-hook"as soon as thestatefulset pod for my workloadon that node goes intoterminatingstate - on the other 2 i will tail the kubelet logs to check what happens
and report back here