bottlerocket icon indicating copy to clipboard operation
bottlerocket copied to clipboard

Support NFS on EKS

Open cwwarren opened this issue 5 years ago • 23 comments

We have an application from a third party vendor that we've been running in Kubernetes using a NFS volume to provide multiple-writer "local" storage. The NFS server is an AWS S3 File Gateway.

Image I'm using:

bottlerocket-aws-k8s-1.18-x86_64-v1.0.3-0c93e9a6 / ami-08f192e9c923b9f23 (eksctl selected this for me)

Attached to an EKS 1.18 (eks.2) cluster.

What I expected to happen:

NFS mount succeeds, allowing the pod to run. When the Pod is scheduled on an AmazonLinux2 Node Group it succeeds.

What actually happened:

Events:
  Type     Reason       Age                      From     Message
  ----     ------       ----                     ----     -------
  Warning  FailedMount  44m (x45 over 7h52m)     kubelet  Unable to attach or mount volumes: unmounted volumes=[redacted-vol-1 redacted-vol-2], unattached volumes=[redacted-vol-1 db-config default-token-pbxwv redacted-vol-2]: timed out waiting for the condition
  Warning  FailedMount  19m (x87 over 7h41m)     kubelet  Unable to attach or mount volumes: unmounted volumes=[redacted-vol-2 redacted-vol-1], unattached volumes=[redacted-vol-2 redacted-vol-1 db-config default-token-pbxwv]: timed out waiting for the condition
  Warning  FailedMount  4m52s (x239 over 7h55m)  kubelet  MountVolume.SetUp failed for volume "redacted-vol-1" : mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/7d54f7fe-9a03-4ff7-8ab4-100a13f49a5f/volumes/kubernetes.io~nfs/redacted-vol-1 --scope -- mount -t nfs redacted-nfs-server:/redacted-app/redacted-folder /var/lib/kubelet/pods/7d54f7fe-9a03-4ff7-8ab4-100a13f49a5f/volumes/kubernetes.io~nfs/redacted-vol-1
Output: mount: /var/lib/kubelet/pods/7d54f7fe-9a03-4ff7-8ab4-100a13f49a5f/volumes/kubernetes.io~nfs/redacted-vol-1: bad option; for several filesystems (e.g. nfs, cifs) you might need a /sbin/mount. helper program.

How to reproduce the problem:

Using eksctl, create an EKS cluster with an unmanaged Node Group with amiFamily: Bottlerocket.

Create a Pod with a volume from a NFS server and a container that mounts that volume, e.g.

volumes:
- name: my-folder
  nfs:
    server: my-nfs-server
    path: /my-s3-bucket/my-folder
volumeMounts:
- name: my-folder
  mountPath: /my-folder

cwwarren avatar Nov 24 '20 15:11 cwwarren

Hello @cwwarren, thank you for opening this issue. Are you using the EFS CSI driver? If so, we have a known issue that we are working on which I can describe further.

webern avatar Nov 24 '20 15:11 webern

Hello @cwwarren, thank you for opening this issue. Are you using the EFS CSI driver? If so, we have a known issue that we are working on which I can describe further.

Thanks for the quick reply! We are not and this is not an EFS volume. It is an exposed volume on a S3 File Gateway instance.

cwwarren avatar Nov 24 '20 17:11 cwwarren

Oh I see, sorry about that. So we're probably looking at something like this: https://github.com/kubernetes-csi/csi-driver-nfs but I don't know if anyone has tried it on Bottlerocket yet though.

webern avatar Nov 24 '20 18:11 webern

No worries, thanks for looking into it. We don't have that CSI driver running and I believe it requires a DaemonSet so it definitely would've been noticed if EKS installed it without our knowledge.

I believe the error is coming from the in tree mounting code, from an (incomplete) skim it appears to be constructing the command that caused an error and the error messages format match up.

https://github.com/kubernetes/mount-utils/blob/72e9681f7438005a58689aee6d31b2346aceafce/mount_linux.go#L132

cwwarren avatar Nov 24 '20 19:11 cwwarren

Upon a little further investigation and google I believe (but haven't confirmed) the root cause of this issue is /sbin/mount.nfs is missing.

This might actually be a feature request instead of a bug, "add NFS common utilities for /sbin/mount.nfs".

cwwarren avatar Nov 24 '20 19:11 cwwarren

We tried on our side using the NFS CSI driver and found the same dependency at issue, /sbin/mount.nfs. I agree we can handle this as a feature request. I am going to label it as such. Do you mind if we rename if something like "Support NFS on EKS" so that it's easy to identify? Thank you for bringing this to our attention!

webern avatar Nov 25 '20 03:11 webern

Yes please edit and categorize however makes it easiest for your team.

cwwarren avatar Nov 25 '20 03:11 cwwarren

Hi @cwwarren, it would seem that currently the NFS CSI driver is in alpha and does not yet support pod inline volumes like the one you're trying to do here.

The workaround is to create a persistent volume backed by your NFS server and mount that for your pods. You can find the driver parameters here.

Please let us know if that works for you.

etungsten avatar Nov 30 '20 20:11 etungsten

Hey @etungsten thanks for looking into it. We're not (yet) using the NFS CSI driver, we're still using the in-tree plugin so I doubt in our specific case we're running into any limitations of the CSI driver here. In this case this is an existing Deployment that runs successfully on AWS EKS AmazonLinux2 nodes (and before that Ubuntu and Debian on a kops cluster).

Those steps are a good reference though, thanks. I'll note that for when we do migrate to the CSI driver in the near future.

cwwarren avatar Nov 30 '20 20:11 cwwarren

hi guys, any plans to get this one implemented anytime soon ? I'm using AWS EKS v1.19 + Bottlerocket amazon-eks-gpu-node-1.19-v20210504 and when trying to use a PVC from AWS EFS (NFS) I have got the below error:

Events:
  Type     Reason       Age                  From               Message
  ----     ------       ----                 ----               -------
  Normal   Scheduled    3m21s                default-scheduler  Successfully assigned tyk/gateway-tyk-pro-lmtbc to ip-10-80-125-180.eu-central-1.compute.internal
  Warning  FailedMount  78s                  kubelet            Unable to attach or mount volumes: unmounted volumes=[geoip], unattached volumes=[tyk-mgmt-gateway-conf tyk-scratch tyk-pro-default-cert geoip default-token-b72n6]: timed out waiting for the condition
  Warning  FailedMount  72s (x9 over 3m20s)  kubelet            MountVolume.SetUp failed for volume "pvc-8cfab7c8-9269-4586-8ffa-314b76d8d409" : mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/2028a85a-81dc-422f-933d-056a856a68da/volumes/kubernetes.io~nfs/pvc-8cfab7c8-9269-4586-8ffa-314b76d8d409 --scope -- mount -t nfs -o vers=4.1 10.80.120.174:/persistent-volumes/xxxxxx/tyk/tyk-efs-volume-pvc-8cfab7c8-9269-4586-8ffa-314b76d8d409 /var/lib/kubelet/pods/2028a85a-81dc-422f-933d-056a856a68da/volumes/kubernetes.io~nfs/pvc-8cfab7c8-9269-4586-8ffa-314b76d8d409
Output: mount: /var/lib/kubelet/pods/2028a85a-81dc-422f-933d-056a856a68da/volumes/kubernetes.io~nfs/pvc-8cfab7c8-9269-4586-8ffa-314b76d8d409: bad option; for several filesystems (e.g. nfs, cifs) you might need a /sbin/mount.<type> helper program.

@webern @gregdek

bmonteiro avatar May 27 '21 06:05 bmonteiro

when trying to use a PVC from AWS EFS (NFS) I have got the below error:

I think this is different then the OP who is trying to mount S3 as an NFS.

The EFS CSI driver was known to be working fairly recently when I fixed this incompatibility:

  • https://github.com/kubernetes-sigs/aws-efs-csi-driver/pull/286
  • https://github.com/bottlerocket-os/bottlerocket/issues/1111

When working on it, the instructions I followed to deploy an EFS PVC were these: https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html

Does the user guide match what you are doing?

webern avatar May 27 '21 18:05 webern

@webern yes, in my case the efs csi driver is also installed but still the same error.

  Warning  FailedMount  12s (x6 over 27s)  kubelet, ip-10-80-127-28.eu-central-1.compute.internal  MountVolume.SetUp failed for volume "pvc-8cfab7c8-9269-4586-8ffa-314b76d8d409" : mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/7912b416-c5a4-4be6-98e7-e794b003ddaa/volumes/kubernetes.io~nfs/pvc-8cfab7c8-9269-4586-8ffa-314b76d8d409 --scope -- mount -t nfs -o vers=4.1 10.80.120.174:/persistent-volumes/eks2/tyk/tyk-efs-volume-pvc-8cfab7c8-9269-4586-8ffa-314b76d8d409 /var/lib/kubelet/pods/7912b416-c5a4-4be6-98e7-e794b003ddaa/volumes/kubernetes.io~nfs/pvc-8cfab7c8-9269-4586-8ffa-314b76d8d409
Output: mount: /var/lib/kubelet/pods/7912b416-c5a4-4be6-98e7-e794b003ddaa/volumes/kubernetes.io~nfs/pvc-8cfab7c8-9269-4586-8ffa-314b76d8d409: bad option; for several filesystems (e.g. nfs, cifs) you might need a /sbin/mount.<type> helper program.
**kubectl get pod -n kube-system -l app=efs-csi-node -o wide**
NAME                 READY   STATUS    RESTARTS   AGE     IP              NODE                                             NOMINATED NODE   READINESS GATES
efs-csi-node-kkj8r   3/3     Running   0          7m28s   10.80.127.28    ip-10-80-127-28.eu-central-1.compute.internal    <none>           <none>

**kubectl get nodes -o wide**
NAME                                             STATUS   ROLES    AGE     VERSION              INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION                CONTAINER-RUNTIME

ip-10-80-127-28.eu-central-1.compute.internal    Ready    <none>   7m47s   v1.19.9              10.80.127.28    <none>        Bottlerocket OS 1.0.8   5.4.105                       containerd://1.4.4+bottlerocket

In case it is not related maybe there is already an issue open on that topic ?

bmonteiro avatar May 28 '21 13:05 bmonteiro

maybe there is already an issue open on that topic ?

There was, but it was fixed. Do you mind opening a new one? I suggest a title like "Unable to mount EFS volume with the aws-efs-csi-driver". We'll try to figure out what's going on there and leave this issue for the more general-purpose NFS issue, such as supporting csi-driver-nfs.

webern avatar May 28 '21 16:05 webern

@webern sure thing, pls refer to #1599

bmonteiro avatar May 28 '21 19:05 bmonteiro

Hey,

is there any update when NFS support is added for Bottlerocket? We try to migrate from Amazon Linux 2 to Bottlerocket, but some of our pods require NFS server access.

We are currently using them via the nfs server type in volumes:

spec:
  volumes:
  - name: nfs-server
    nfs:
      server: 10.10.1.1
      path: /path

basert avatar Oct 13 '21 15:10 basert

@basert, thanks for reaching out! If you're running on AWS, our understanding is you should be able to mount the volumes with the aws-efs-csi-driver detailed above. If not, can you elaborate on your use-case?

jpculp avatar Oct 13 '21 17:10 jpculp

In our usecase, multiple legacy deployments are using the same EFS share to access files. We tried getting this working with the efs-csi driver, but failed, because we need to mount the pre-existing EFS volume to multiple deployments.

The EFS driver either needs pre-created persistent volumes (which can't be claimed by multiple deployments in different namespaces) or a storage class, which requires an access point. The driver will force a basePath on the volume (https://github.com/kubernetes-sigs/aws-efs-csi-driver/blob/f6d289667ea71f2c2d1a9ec78b0224f066369e40/pkg/driver/controller.go#L201), so we can't use the same file root on multiple pods.

Both solutions do not work for us.

basert avatar Oct 13 '21 18:10 basert

Now that I think about this a bit, I'm not sure we did everything correct when trying to migrate to the efs-csi driver. Logic wise it should work just fine. I will test again and check if I can come up with a reproducible case :)

basert avatar Oct 13 '21 19:10 basert

Hello there

I cannot use the NFS CSI driver because it does not support read only NFS endpoints. So this is still a problem and the basic "NFS" mount should be supported.

duckie avatar Apr 07 '22 22:04 duckie

Hey @duckie , could you please help us clarify the use case that you have, do you want to:

  1. Use a NFS filesystem exposed in a NFS server as read-only, i .e. :
# /etc/exports
/nfsshare my.domain(ro)
  1. Mount the NFS filesystem in the pod as read-only, i. e.:
mount.nfs -o ro <> <>

For 1) we might have to ask the NFS CSI driver maintainers to support the use case, for 2) I think you can add mount options, as described in the NFS CSI README.

arnaldo2792 avatar Apr 11 '22 23:04 arnaldo2792

Hello team, do you know if a solution to the problem came out?

mlorenzo92 avatar Sep 16 '22 20:09 mlorenzo92

@mlorenzo92, thanks for reaching out. Are you able to take advantage of the aws-efs-csi-driver?

jpculp avatar Sep 16 '22 23:09 jpculp

@jpculp thank's. it's working perfect. Install the driver and kiss me a lot in this doc: https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html

mlorenzo92 avatar Sep 21 '22 13:09 mlorenzo92

We have verified https://github.com/kubernetes-csi/csi-driver-nfs successfully on bottlerocket. The newer version v4.1.0 at least seems to work fine without needing nfs-common or other binaries dependencies on the host os.

The StorageClass to mount an on-prem NFS share was defined like

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nfs-sc
provisioner: nfs.csi.k8s.io
parameters:
  server: <nfs-domain>
  share: /<share-path>
  subDir: <subDir>
reclaimPolicy: Delete
volumeBindingMode: Immediate
mountOptions:
- nfsvers=3
- hard
- nolock

gazal-k avatar Jan 25 '23 06:01 gazal-k

Thanks for sharing @gazal-k ! @cwwarren, could you please try a newer version of the CSI driver and let us know if you are still having problems?

arnaldo2792 avatar Jan 31 '23 00:01 arnaldo2792

I have since switched jobs and am no longer working with Kubernetes or EFS so I’m unable to readily test. However the comments in this thread seem to indicate it’s all working now! I’m more than happy to have this be closed out.

Many thanks to the team for getting this shipped!

cwwarren avatar Jan 31 '23 00:01 cwwarren

A lot of history here in the comments, but if I understand correctly everything is now working as expected. If not, please open a new issue to track any problems. Thanks!

stmcginnis avatar Feb 16 '23 21:02 stmcginnis