aws-efs-csi-driver icon indicating copy to clipboard operation
aws-efs-csi-driver copied to clipboard

Pod Fails with: 0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports. efs

Open ptwohig opened this issue 2 years ago • 3 comments

/kind bug

What happened?

Deploying via helm, the EFS driver will attempt to run two pods. One pod runs fine, one pod fails with the following message:

0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports. efs

What you expected to happen?

I'm not sure the intended behavior. I suspect that:

  • All pods start up properly.
  • Two pods using the same port aren't launched on the same node.

How to reproduce it (as minimally and precisely as possible)?

I am running a simple cluster which deploys a single m5.medium node in to AZs. I believe this is the minimum required node pool.

Anything else we need to know?:

I've tried two separate ways to install the driver. This happens with the instructions provided as well as my Terraform configuration.

helm repo add aws-efs-csi-driver https://kubernetes-sigs.github.io/aws-efs-csi-driver/
helm repo update
helm upgrade --install aws-efs-csi-driver --namespace kube-system aws-efs-csi-driver/aws-efs-csi-driver

Or

resource "helm_release" "efs_csi_driver" {
  name = "aws-efs-csi-driver"
  chart = "aws-efs-csi-driver"
  repository = "https://kubernetes-sigs.github.io/aws-efs-csi-driver/"
  namespace = "kube-system"
}

If I were to guess it's making one pod for each AZ but since I'm running one pod it's trying to place two pods on the same machine resulting in the port conflict.

The problem for Terraform users is that if all pods aren't healthy within five minutes of launch, the deployment fails.

Environment

  • Kubernetes version (use kubectl version):
Client Version: v1.24.2
Kustomize Version: v4.5.4
Server Version: v1.22.9-eks-a64ea69
  • Driver version: Unsure. I'm presuming the master version.

ptwohig avatar Jun 23 '22 19:06 ptwohig

Yep. Confirmed that by adding a second node to my pool solves the issue.

ptwohig avatar Jun 23 '22 20:06 ptwohig

Now this sounds a lot like this issue: https://github.com/kubernetes-sigs/aws-efs-csi-driver/issues/563

The list of which was that because people wanted to run the driver in environments where EC2's IMDS (Instance Metadata Service) was inaccessible the driver had to be installed with hostNetwork:true set for the deployment pods. This causes a problem because it means that if one pod using this networking mode binds a port then nothing else can. So because the driver deploys a daemonset and a deployment you end in situations like you describe where the minimum number of nodes you can realistically run is 2.

However, now that https://github.com/kubernetes-sigs/aws-efs-csi-driver/pull/681 has merged and has been released as driver version v.1.4.0 (Helm Chart 2.2.7) you should be able to install that and then find that the driver works with only 1 node. This is because the controller pods will be able to draw IPs and use ports as part of Pod Networking rather than using the networking of the host while the daemonset can continue to use the host with no contention. Let me know if this works, as one of the team that enabled this I'd be really excited to know it's actually helping someone.

jonathanrainer avatar Jun 24 '22 08:06 jonathanrainer

Hey @jonathanrainer! I had the same issue with an old version of the driver. Can confirm that it is now working with 1 node and Helm Chart 2.2.7.

dremmos avatar Aug 12 '22 20:08 dremmos

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Nov 10 '22 20:11 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Dec 10 '22 21:12 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Jan 09 '23 21:01 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jan 09 '23 21:01 k8s-ci-robot

I think this problem is almost solved by https://github.com/kubernetes-sigs/aws-efs-csi-driver/pull/681 and https://github.com/kubernetes-sigs/aws-efs-csi-driver/pull/851 should be a last piece.

bigwheel avatar Jan 12 '23 11:01 bigwheel

Any update on this ?

Getting the same error in helmchart version : 2.4.4

Events:
  Type     Reason            Age                    From               Message
  ----     ------            ----                   ----               -------
  Warning  FailedScheduling  3m36s (x2 over 8m48s)  default-scheduler  0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod..

terraform

    name        = "aws-efs-csi-driver"
    chart       = "aws-efs-csi-driver"
    repository  = "https://kubernetes-sigs.github.io/aws-efs-csi-driver/"
    version     = "2.4.4"
    namespace   = "kube-system"
    description = "AWS EFS CSI Driver helm Chart deployment configuration"

h1manshu98 avatar Aug 08 '23 08:08 h1manshu98

As of Today this problem still exists

4m2s        Warning   FailedScheduling    pod/efs-csi-controller-6ffcb4cf58-r5cs9    0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports. preemption: 0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports..

helm version = "~> 2.10.1 k88 version 1.27 efs CSI driver version v1.5.8 and v1.5.9

anandshivam44 avatar Aug 13 '23 18:08 anandshivam44