aws-ebs-csi-driver
aws-ebs-csi-driver copied to clipboard
EBS CSI Driver attaches node backed volume instead of PVC
/kind bug
We have
- EKS Kubernetes Cluster
- AWS EBS CSI driver installed through the AWS managed to add ons (Version v1.4.0-eksbuild.preview)
- A Statefullset for our service (Replica of 1)
apiVersion: apps/v1
kind: StatefulSet
metadata:
containers:
....
volumeMounts:
- mountPath: /ipfscache
name: ipfscache
volumeClaimTemplates:
- metadata:
name: ipfscache
spec:
accessModes:
- ReadWriteOnce
storageClassName: io2
resources:
requests:
storage: 10000Gi
- IO2 StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: io2
provisioner: ebs.csi.aws.com
parameters:
type: io2
fsType: ext4
iopsPerGB: "10"
volumeBindingMode: WaitForFirstConsumer
- A Pod (Part of the Statefullset)
- An EBS volume of 10Tb and storage calss of io2 attached (Provisioned dynamically by AWS EBS CSi)
- The workload is running on m5.8xlarge instances
What happened?
When a pod is initialized, the EBS volume is successfully attached to the pod. However, sometimes when the pod is deleted and recreated for any reason such as configuration changes to the STS, the EBS volume doesn't attach properly to the Pod. To make the problem worst, the Pod actually starts with an invalid volume mount. The directory path that is expected to be the PVC-backed volume is in fact backed by the EC2 instance volume.
This is the result of running df -h when the pod state is correct.
$ df -h
overlay 100.0G 5.3G 94.7G 5% /
tmpfs 64.0M 0 64.0M 0% /dev
tmpfs 61.5G 0 61.5G 0% /sys/fs/cgroup
/dev/nvme1n1 9.6T 3.8T 5.8T 39% /ipfscache
/dev/nvme0n1p1 100.0G 5.3G 94.7G 5% /dev/termination-log
/dev/nvme0n1p1 100.0G 5.3G 94.7G 5% /etc/resolv.conf
/dev/nvme0n1p1 100.0G 5.3G 94.7G 5% /etc/hostname
/dev/nvme0n1p1 100.0G 5.3G 94.7G 5% /etc/hosts
shm 64.0M 0 64.0M 0% /dev/shm
tmpfs 61.5G 12.0K 61.5G 0% /run/secrets/kubernetes.io/serviceaccount
tmpfs 61.5G 4.0K 61.5G 0% /run/secrets/eks.amazonaws.com/serviceaccount
tmpfs 61.5G 0 61.5G 0% /proc/acpi
tmpfs 64.0M 0 64.0M 0% /proc/kcore
tmpfs 64.0M 0 64.0M 0% /proc/keys
tmpfs 64.0M 0 64.0M 0% /proc/latency_stats
tmpfs 64.0M 0 64.0M 0% /proc/timer_list
tmpfs 64.0M 0 64.0M 0% /proc/sched_debug
tmpfs 61.5G 0 61.5G 0% /sys/firmware
And this is the result when the pod is in a bad state
$ df -h
overlay 100.0G 5.3G 94.7G 5% /
tmpfs 64.0M 0 64.0M 0% /dev
tmpfs 61.5G 0 61.5G 0% /sys/fs/cgroup
/dev/nvme0n1p1 100.0G 5.3G 94.7G 5% /ipfscache
/dev/nvme0n1p1 100.0G 5.3G 94.7G 5% /dev/termination-log
/dev/nvme0n1p1 100.0G 5.3G 94.7G 5% /etc/resolv.conf
/dev/nvme0n1p1 100.0G 5.3G 94.7G 5% /etc/hostname
/dev/nvme0n1p1 100.0G 5.3G 94.7G 5% /etc/hosts
shm 64.0M 0 64.0M 0% /dev/shm
tmpfs 61.5G 12.0K 61.5G 0% /run/secrets/kubernetes.io/serviceaccount
tmpfs 61.5G 4.0K 61.5G 0% /run/secrets/eks.amazonaws.com/serviceaccount
tmpfs 61.5G 0 61.5G 0% /proc/acpi
tmpfs 64.0M 0 64.0M 0% /proc/kcore
tmpfs 64.0M 0 64.0M 0% /proc/keys
tmpfs 64.0M 0 64.0M 0% /proc/latency_stats
tmpfs 64.0M 0 64.0M 0% /proc/timer_list
tmpfs 64.0M 0 64.0M 0% /proc/sched_debug
tmpfs 61.5G 0 61.5G 0% /sys/firmware
Note the /dev/nvme0n1p1 100.0G 5.3G 94.7G 5% /ipfscache line.
When describing the Pod, we can see the following events
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 5m6s default-scheduler Successfully assigned XXXXX to ip-XXXX.us-east-2.compute.internal
Warning FailedMount 3m3s kubelet Unable to attach or mount volumes: unmounted volumes=[ipfscache], unattached volumes=[kube-api-access-XXXX aws-iam-token ipfscache]: timed out waiting for the condition
Warning FailedMount 2m45s (x7 over 3m17s) kubelet MountVolume.SetUp failed for volume "pvc-XXXX" : rpc error: code = Aborted desc = An operation with the given volume="vol-XXXX" is already in progress
Normal Pulled 2m12s kubelet Container image "XXXXXXXXXXXX" already present on machine
Normal Created 2m12s kubelet Created container XXXXX
Normal Started 2m12s kubelet Started container XXXXX
Tailing the EBS plugin pod logs, we can see a few different warnings. Note that the timelines don't exactly match because they are not from the same occurrence of this issue.
...
W0204 19:19:55.218272 1 cloud.go:547] Ignoring error from describe volume for volume "vol-XXXX"; will retry: "RequestCanceled: request context canceled\ncaused by: context canceled"
...
E0204 19:19:55.218315 1 driver.go:119] GRPC error: rpc error: code = Internal desc = Could not detach volume "vol-XXXX" from node "i-XXXX": timed out waiting for the condition
....
W0210 21:17:24.562919 1 cloud.go:480] DetachDisk called on non-attached volume: vol-XXXX
Tailing the CSI attached pod logs, we can see other warnings.
I0210 21:17:24.661529 1 csi_handler.go:281] Failed to save detach error to "csi-XXXX": volumeattachments.storage.k8s.io "csi-XXXX" not found
I0210 21:17:24.661547 1 csi_handler.go:226] Error processing "csi-XXXX": failed to detach: could not mark as detached: volumeattachments.storage.k8s.io "csi-XXXX" not found
What you expected to happen?
The pod shouldn't start with invalid volume. This can cause data loss and corruption.
How to reproduce it (as minimally and precisely as possible)?
We can't reproduce it intentionally or pinpoint exactly what sequence of events has to happen before the pod volume is in a bad state. However, it happens at least once or twice a day when there are new configuration changes to the STS. This happens after the pod is deleted and recreated.
From the logs, we speculate that when the pod is deleted and another one is created with new configurations, it takes a while for the volume to detach and attach. This could be due to the volume size we are using but not exactly sure.
Anything else we need to know?:
First, when the pod is in a bad state and I run the kubectl describe on the pod, it still shows that the directory is backed by a PVC. In other words, the pod description shows the PVC attached.
Second, the AWS console and CLI, show the EBS volume as in-use.
Third, after ssh-ing to the K8s worker node, I can confirm that the volume mount is indeed backed by the EC2 instance volume by running df -h on it.
Environment
- Kubernetes version (use
kubectl version):
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:31:21Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.5-eks-bc4871b", GitCommit:"5236faf39f1b7a7dabea8df12726f25608131aa9", GitTreeState:"clean", BuildDate:"2021-10-29T23:32:16Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}
- Driver version:
v1.4.0-eksbuild.preview
Any update here? Also seeing this.
Also seeing something similar - pod was recreated to give it more CPU and memory and getting
Warning FailedMount 4m36s kubelet, ip-10-102-127-112.ec2.internal Unable to attach or mount volumes: unmounted volumes=[platform-pulsar-bookkeeper-journal platform-pulsar-bookkeeper-ledgers], unattached volumes=[istio-podinfo platform-pulsar-bookkeeper-journal platform-pulsar-bookkeeper-ledgers platform-pulsar-bookkeeper-token-xczhl istiod-ca-cert istio-data istio-envoy istio-token]: timed out waiting for the condition
Warning FailedMount 117s (x3 over 6m1s) kubelet, ip-10-102-127-112.ec2.internal MountVolume.MountDevice failed for volume "pvc-4c7b87de-8e27-4faf-94be-7fe719c3b5c6" : rpc error: code = Aborted desc = An operation with the given volume="vol-08ecdf3c15bb04a5f" is already in progress
Warning FailedMount 101s (x3 over 5m45s) kubelet, ip-10-102-127-112.ec2.internal MountVolume.MountDevice failed for volume "pvc-bf7a407f-4a85-407f-9e14-ca756fe19f25" : rpc error: code = Aborted desc = An operation with the given volume="vol-00ac669e7644edc35" is already in progress
We are seeing similar issue on our end
It already caused at least two production outages and we had no luck replicating it on our development environments.
We are using c5d.18xlarge node on EKS v1.21 and with EBS CSI driver v1.6.1
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/remove-lifecycle-rotten
Any updates on this issue, we are also facing the same issue where it mounts on the node volume instead of pvc/ebs volume. Also ebs-csi-node misses theNodeStageVolume events and directly calling NodePublishVolume events resulting in the node volume mount issue in case the unmount got failed/timedout.
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Reopen this issue with
/reopen - Mark this issue as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closedYou can:
- Reopen this issue with
/reopen- Mark this issue as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Any update on this? Was anybody able to figure this out? I have been in a loop for about 2 days with the exact same issue (our entire application has been down).
Same here. Application down because of this
This is bugging as well. ASG replaced nodes after new AMI was specified. EBS CSI shows PVC-s as bound while AWS console shows volumes as available. This bug is not solved.
I'm facing the same issue. The deployment strategy is Recreate but the volume is failed to detach from old node.
Was this ever addressed? still seeing this on some volumes