aws-efs-csi-driver icon indicating copy to clipboard operation
aws-efs-csi-driver copied to clipboard

Could not start amazon-efs-mount-watchdog

Open OuFinx opened this issue 3 years ago • 1 comments
trafficstars

What happened?

I've two regions: main us-east-1 and replica us-west-1 (Between regions I've configured VPC peering) EFS was deployed to the us-east-1 region EFS has two inbound rules for 2049 port for 172.240.0.0/16 (vpc from the main region) and 172.241.0.0/16 (vpc from the replica region)

EKS in us-east-1 was connected to EFS - everything is OK. EKS in us-west-1 can't connect to EFS - efs-csi-nodes shows:

Mounting command: mount
Mounting arguments: -t efs -o tls fs-029eae49f385be828:/test /var/lib/kubelet/pods/ac6e24ee-46bf-41ba-31a8-b4cb3ad211da/volumes/kubernetes.io~csi/efs-tm2tb/mount
Output: Could not start amazon-efs-mount-watchdog, unrecognized init system "aws-efs-csi-dri"
b'mount.nfs4: mount system call failed'

efs-csi-nodes have:

node:
  hostAliases:
    "fs-029eae49f385be828":
      ip: 172.240.116.107
      region: us-west-1

Pod which is trying to connect to VPC

Unable to attach or mount volumes: unmounted volumes=[persistent-storage-efs], unattached volumes=[persistent-storage-efs cachedir nltk-data kube-api-access-wzdv8]: timed out waiting for the condition

BTW, I've spun up pod with ubuntu and successfully connected to the EFS through, so, replica EKS has access to the EFS

mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport 172.240.116.107:/ efs

What you expected to happen?

I expected my deployment from the replica region will be connected to the EFS.

Anything else we need to know?:

StorageClass config

storageClasses: 
- name: efs-sc
  parameters:
    provisioningMode: efs-ap
  reclaimPolicy: Delete
  volumeBindingMode: Immediate

Environment

  • Kubernetes version (use kubectl version): 1.21
  • Driver version: 2.2.4

OuFinx avatar Sep 05 '22 14:09 OuFinx

I checked again and got the same result What am I doing wrong?

OuFinx avatar Sep 13 '22 07:09 OuFinx

I have the same problem. I followed the installation instructions 1, 2 Deployed pod and it failed to connect with the same error. True, in my case, everything is in the same region and in the same VPS. I use eks with managed nodes. Deployed by helm chart 2.3.2 version

tropnikovvl avatar Nov 17 '22 19:11 tropnikovvl

I got the same error message.
If fixed it by adding following permissions to the IAM role and its policy which is assumed by efs node pods:

elasticfilesystem:ClientMount
elasticfilesystem:ClientWrite
elasticfilesystem:ClientRootAccess

nichoio avatar Dec 15 '22 11:12 nichoio

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Mar 22 '23 18:03 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Apr 21 '23 18:04 k8s-triage-robot

/kind support

RyanStan avatar May 15 '23 14:05 RyanStan

Hi @OuFinx, are you still facing the issue?

mskanth972 avatar May 18 '23 13:05 mskanth972

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Jun 17 '23 14:06 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jun 17 '23 14:06 k8s-ci-robot

/reopen

sdrabblescripta avatar Oct 02 '23 14:10 sdrabblescripta

@sdrabblescripta: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Oct 02 '23 14:10 k8s-ci-robot

/remove-lifecycle rotten

sdrabblescripta avatar Oct 02 '23 14:10 sdrabblescripta