/kind bug

What happened?

I'm running into an issue where existing PVs are only re-used if a node is available at time of PVC creation.

When I scale up the pods they create new PVCs right away (while the pod moves into ContainerCreating). If there are nodes available, an existing PV is bound to the PVC right away. If there are no nodes available then the PVC is pending and as soon as a new node moves into a Ready state, a new PV will be provisioned and bound even though there are 100+ existing PVs that meet the requirements. Then if I schedule a new pod on that new node, an existing PV will be used for subsequent pods attached to the node. It's worth noting I am scaling nodes with karpenter, and have locked it down to a single availability zone so all PVs are in a single zone.

I've ended up with hundreds of PVs for something that dynamically scales between 0 and 6 pods. This is an actions-runner from the actions-runner-controller to run github actions on EKS.

Additional Testing

I deleted all the PVs Then in a single AZ I created 60 pods which created 60 PVs Then I scaled to 0, waited a while and made sure everything was available, then scaled to 60 - this created 28 more pods for a total of 88. The rest were bound to existing pods. Then I did it again and this time it created 25 more pods for a total of 113. This was because there were some 2xl nodes that allowed for more pods to join. It seems that that the first pod to join the node is creating a new PV while the second (and sometimes 3rd) pod to join is using an existing PV.

Relevant Logs

the only logs the csi-controller produces are

I1024 14:16:00.140116       1 cloud.go:713] "Waiting for volume state" volumeID="vol-07da60cb4e75fa23b" actual="attaching" desired="attached"
I1024 14:16:45.736756       1 cloud.go:713] "Waiting for volume state" volumeID="vol-017224c77fe3e01f6" actual="attaching" desired="attached"
And the ebs-csi-node that comes up in response to the new node shows
Defaulted container "ebs-plugin" out of: ebs-plugin, node-driver-registrar, liveness-probe
I1024 14:16:39.952320       1 driver.go:75] "Driver Information" Driver="ebs.csi.aws.com" Version="v1.19.0"
I1024 14:16:39.952362       1 node.go:85] "regionFromSession Node service" region=""
I1024 14:16:39.952371       1 metadata.go:85] "retrieving instance data from ec2 metadata"
I1024 14:16:39.953346       1 metadata.go:92] "ec2 metadata is available"
I1024 14:16:39.953741       1 metadata_ec2.go:25] "Retrieving EC2 instance identity metadata" regionFromSession=""
I1024 14:16:49.743118       1 mount_linux.go:517] Disk "/dev/nvme1n1" appears to be unformatted, attempting to format as type: "ext4" with options: [-F -m0 /dev/nvme1n1]
I1024 14:16:50.081231       1 mount_linux.go:528] Disk successfully formatted (mkfs): ext4 - /dev/nvme1n1 /var/lib/kubelet/plugins/kubernetes.io/csi/ebs.csi.aws.com/ad9bcd0a40bcd21382425af4ee754c0bd51e9e1a07000680a9e75a86ab0bb7d5/globalmount
I1024 14:16:50.081317       1 mount_linux.go:245] Detected OS without systemd

which seem to pertain to the root volume provisioning (which is working well), but I'm concerned about the mounted volume

          volumeMounts:
            - name: var-lib-docker
              mountPath: /var/lib/docker
...
  volumeClaimTemplates:
    - metadata:
        name: var-lib-docker
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 22Gi
        storageClassName: arc-cache-infra-tests

which uses the storage class

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: arc-cache-infra-tests
  labels:
    content: arc-cache-infra-tests
provisioner: ebs.csi.aws.com
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

Expected Behavior

PVs should be reused when a new node starts. New PVs should only be created when existing PVs are unavailable.

Reproduction Steps

How to reproduce it (as minimally and precisely as possible)? You can use the actions-runners, but I have also simulated this with statefulsets to make it easier to reproduce.

# StorageClass yaml
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: arc-cache-infra-tests
  labels:
    content: arc-cache-infra-tests
provisioner: ebs.csi.aws.com
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

# StatefulSet yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: busybox-statefulset
  namespace: actions-runner-system
spec:
  serviceName: "busybox"
  replicas: 20
  selector:
    matchLabels:
      app: busybox
  template:
    metadata:
      labels:
        app: busybox
    spec:
      serviceAccountName: runner-sa
      tolerations:
        - key: purpose
          operator: Equal
          value: github-runner
          effect: NoSchedule
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: purpose
                    operator: In
                    values:
                      - github-runner
      containers:
      - name: busybox
        image: busybox
        command: ["tail", "-f", "/dev/null"]
        resources:
          requests:
            cpu: "1500m"
            memory: "1500Mi"
          limits:
            cpu: "1500m"
            memory: "1500Mi"
        volumeMounts:
        - name: var-lib-docker
          mountPath: /var/lib/docker
  volumeClaimTemplates:
  - metadata:
      name: var-lib-docker
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 22Gi
      storageClassName: arc-cache-infra-tests

# Karpenter Provisioner and AWSNodeTemplate
---
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: github-runner-testing-cpu-75-c7a
spec:
  weight: 50
  limits:
    resources:
      cpu: '300'
  providerRef:
    name: github-runner-75
  consolidation:
    enabled: false
  ttlSecondsUntilExpired: 600  #  10 mins
  ttlSecondsAfterEmpty: 600  #  10 mins
  taints:
    - key: purpose
      value: github-runner
      effect: NoSchedule
  labels:
    scheduler: karpenter
    purpose: github-runner
    constraint: cpu  # cpu or memory
    size: large
    lifecycle: ephemeral  # ephemeral or persistent
    usage: testing
  requirements:
    - key: karpenter.sh/capacity-type
      operator: In
      values: [spot]
    - key: karpenter.k8s.aws/instance-family
      operator: In
      values: [c7a]
    - key: karpenter.k8s.aws/instance-size
      operator: In
      values: [xlarge]
    - key: topology.kubernetes.io/zone
      operator: In
      values: [us-west-2a]
    - key: kubernetes.io/os
      operator: In
      values:
        - linux
    - key: kubernetes.io/arch
      operator: In
      values:
        - amd64
---
apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:
  name: github-runner-75
spec:
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 75Gi
        volumeType: gp3
        encrypted: true
  subnetSelector:
    karpenter.sh/discovery: primary-cluster
  securityGroupSelector:
    karpenter.sh/discovery: primary-cluster
  instanceProfile: github-instance-profile
  metadataOptions:
    httpEndpoint: enabled
    httpProtocolIPv6: disabled
    httpPutResponseHopLimit: 2
    httpTokens: optional

Environment AWS EKS

Kubernetes version (use kubectl version): Client Version: v1.28.1 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.27.4-eks-2d98532
Driver version: 1.19

Oct 24 '23 16:10 koreyGambill

I see the same issue for the following environment: Environment AWS EKS

Kubernetes version (use kubectl version): Client Version: v1.26.11 Kustomize Version: v4.5.7 Server Version: v1.25.16-eks-8cb36c9 Driver version: 2.22.0

Dec 16 '23 10:12 danavatavu

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Mar 15 '24 11:03 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Apr 14 '24 11:04 k8s-triage-robot

/remove-lifecycle rotten

May 08 '24 15:05 AndrewSirenko

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Aug 06 '24 15:08 k8s-triage-robot

aws-ebs-csi-driver
aws-ebs-csi-driver copied to clipboard

Race Condition - PVs don't get reused when starting new node

What happened?

Additional Testing

Relevant Logs

Expected Behavior

Reproduction Steps

aws-ebs-csi-driver aws-ebs-csi-driver copied to clipboard

Race Condition - PVs don't get reused when starting new node

What happened?

Additional Testing

Relevant Logs

Expected Behavior

Reproduction Steps

aws-ebs-csi-driver
aws-ebs-csi-driver copied to clipboard