usernetes
usernetes copied to clipboard
NFS provisioner fails
Hey, great project! I tried to get the helm nfs server provisioner running and ran into some roadblocks on a GCE centOS 8 machine.
First, just starting the helm chart with this config: (run through envsubst)
replicaCount: 1
image:
repository: quay.io/kubernetes_incubator/nfs-provisioner
tag: v2.2.1-k8s1.12
pullPolicy: IfNotPresent
service:
type: ClusterIP
nfsPort: 2049
mountdPort: 20048
rpcbindPort: 51413
externalIPs: []
persistence:
enabled: true
storageClass: "-"
accessMode: ReadWriteOnce
size: "100Gi"
storageClass:
create: true
defaultClass: false
name: nfs
allowVolumeExpansion: true
parameters: {}
mountOptions:
- vers=4.1
- noatime
reclaimPolicy: Retain
rbac:
create: true
serviceAccountName: default
resources:
{}
nodeSelector:
kubernetes.io/hostname: ${nodename}
tolerations: []
affinity: {}
Afterwards, the necessary PV is created and a PVC is bound to nfs:
apiVersion: v1
kind: PersistentVolume
metadata:
name: "${nfspvcname}"
spec:
capacity:
storage: "100Gi"
accessModes:
- ReadWriteOnce
hostPath:
path: "${mainstoragepath}"
claimRef:
namespace: default
name: "${nfspvcname}"
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: default-pvc
spec:
storageClassName: "nfs"
accessModes:
- ReadWriteMany
resources:
requests:
storage: "90Gi"
$nfspvcname
is set to the PVC created by NFS.
Now the pod for nfs crashes continously:
kubectl describe pod nfs-release-nfs-server-provisioner-0
[INFO] Entering RootlessKit namespaces: OK
Name: nfs-release-nfs-server-provisioner-0
Namespace: default
Node: centos/10.0.2.100
Start Time: Fri, 03 Jan 2020 17:49:16 +0000
Labels: app=nfs-server-provisioner
chart=nfs-server-provisioner-0.3.2
controller-revision-hash=nfs-release-nfs-server-provisioner-79c9977558
heritage=Helm
release=nfs-release
statefulset.kubernetes.io/pod-name=nfs-release-nfs-server-provisioner-0
Annotations: <none>
Status: Running
IP: 10.88.0.4
IPs:
IP: 10.88.0.4
Controlled By: StatefulSet/nfs-release-nfs-server-provisioner
Containers:
nfs-server-provisioner:
Container ID: docker://8ce423f7c0df95d08a4c49531b0fd59d6a8e8d97afd9e7756a99a38e51b9736f
Image: quay.io/kubernetes_incubator/nfs-provisioner:v2.2.1-k8s1.12
Image ID: docker-pullable://quay.io/kubernetes_incubator/nfs-provisioner@sha256:f0f0d9d39f8aac4a2f39a1b0b602baa993bca0f22c982f208ca9d7a0d2b2399f
Ports: 2049/TCP, 20048/TCP, 111/TCP, 111/UDP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/UDP
Args:
-provisioner=cluster.local/nfs-release-nfs-server-provisioner
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 255
Started: Fri, 03 Jan 2020 17:56:16 +0000
Finished: Fri, 03 Jan 2020 17:56:23 +0000
Ready: False
Restart Count: 6
Environment:
POD_IP: (v1:status.podIP)
SERVICE_NAME: nfs-release-nfs-server-provisioner
POD_NAMESPACE: default (v1:metadata.namespace)
Mounts:
/export from data (rw)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-nfs-release-nfs-server-provisioner-0
ReadOnly: false
QoS Class: BestEffort
Node-Selectors: kubernetes.io/hostname=centos
Tolerations: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling <unknown> default-scheduler error while running "VolumeBinding" filter plugin for pod "nfs-release-nfs-server-provisioner-0": pod has unbound immediate PersistentVolumeClaims
Warning FailedScheduling <unknown> default-scheduler error while running "VolumeBinding" filter plugin for pod "nfs-release-nfs-server-provisioner-0": pod has unbound immediate PersistentVolumeClaims
Normal Scheduled <unknown> default-scheduler Successfully assigned default/nfs-release-nfs-server-provisioner-0 to centos
Normal Pulling 11m kubelet, centos Pulling image "quay.io/kubernetes_incubator/nfs-provisioner:v2.2.1-k8s1.12"
Normal Pulled 10m kubelet, centos Successfully pulled image "quay.io/kubernetes_incubator/nfs-provisioner:v2.2.1-k8s1.12"
Normal Pulled 10m (x2 over 10m) kubelet, centos Container image "quay.io/kubernetes_incubator/nfs-provisioner:v2.2.1-k8s1.12" already present on machine
Normal Created 10m (x3 over 10m) kubelet, centos Created container nfs-server-provisioner
Normal Started 10m (x3 over 10m) kubelet, centos Started container nfs-server-provisioner
Warning BackOff 9m47s (x3 over 10m) kubelet, centos Back-off restarting failed container
Warning MissingClusterDNS 50s (x57 over 11m) kubelet, centos pod: "nfs-release-nfs-server-provisioner-0_default(ba5bd6f1-c4af-4f78-93f2-c9a48d126ba9)". kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy.
Since the error seems to be related to missing DNS services I tried to setup kube dns via https://github.com/kelseyhightower/kubernetes-the-hard-way/blob/master/docs/12-dns-addon.md
I had to change the IP from 10.32.0.10 to 10.0.0.10, but the dns pod also fails:
kubectl describe pod coredns-68567cdb47-78x67 --namespace kube-system
[INFO] Entering RootlessKit namespaces: OK
Name: coredns-68567cdb47-78x67
Namespace: kube-system
Priority Class Name: system-cluster-critical
Node: centos/10.0.2.100
Start Time: Fri, 03 Jan 2020 17:48:54 +0000
Labels: k8s-app=kube-dns
pod-template-hash=68567cdb47
Annotations: <none>
Status: Running
IP: 10.88.0.3
IPs:
IP: 10.88.0.3
Controlled By: ReplicaSet/coredns-68567cdb47
Containers:
coredns:
Container ID: docker://387906805acc0fff0f1bbf1e392e886e77c09363bbcce61720db4f316862aaa7
Image: coredns/coredns:1.6.2
Image ID: docker-pullable://coredns/coredns@sha256:12eb885b8685b1b13a04ecf5c23bc809c2e57917252fd7b0be9e9c00644e8ee5
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Fri, 03 Jan 2020 17:59:48 +0000
Finished: Fri, 03 Jan 2020 17:59:48 +0000
Ready: False
Restart Count: 7
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
QoS Class: Burstable
Node-Selectors: beta.kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned kube-system/coredns-68567cdb47-78x67 to centos
Normal Pulling 14m kubelet, centos Pulling image "coredns/coredns:1.6.2"
Normal Pulled 14m kubelet, centos Successfully pulled image "coredns/coredns:1.6.2"
Normal Created 13m (x5 over 14m) kubelet, centos Created container coredns
Normal Started 13m (x5 over 14m) kubelet, centos Started container coredns
Normal Pulled 13m (x4 over 14m) kubelet, centos Container image "coredns/coredns:1.6.2" already present on machine
Warning BackOff 4m51s (x50 over 14m) kubelet, centos Back-off restarting failed container
Stderr of run.sh:
[kubelet-dockershim] E0103 18:03:48.735562 114 container_manager_linux.go:477] cpu and memory cgroup hierarchy not unified. cpu: /, memory: /user.slice/user-1000.slice/session-1.scope
[kubelet-dockershim] E0103 18:03:48.867398 114 container_manager_linux.go:101] Unable to ensure the docker processes run in the desired containers: errors moving "dockerd" pid: failed to apply oom score -999 to PID 85: write /proc/85/oom_score_adj: permission denied
...
[dockerd] time="2020-01-03T18:04:56.269919903Z" level=info msg="shim containerd-shim started" address=/containerd-shim/6ba264bcbcc738a4686c6b6bbc36cd4c96cbd3a5ff04b2a14b4064f48779d088.sock debug=false pid=7150
[dockerd] time="2020-01-03T18:04:56.903241134Z" level=info msg="shim reaped" id=d023513cf1411c90f83bf1a30d2d55a3e3cde300d59dea05f2ecafea011be2e1
[dockerd] time="2020-01-03T18:04:56.913507529Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
[dockerd] time="2020-01-03T18:05:05.238881369Z" level=info msg="shim containerd-shim started" address=/containerd-shim/1a5da1ec06ca697778b35df31d55a6f5befe987f6a82b138b3def79ccab895eb.sock debug=false pid=7269
[dockerd] time="2020-01-03T18:05:05.601143797Z" level=info msg="shim reaped" id=e6eebcd70064eed6d293f505e98c9e6a3d2a682213a281cdfdebecf245b2f3fa
[dockerd] time="2020-01-03T18:05:05.611475434Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
...
[kubelet-dockershim] E0103 18:05:06.218907 114 pod_workers.go:191] Error syncing pod 10011dcf-1fcc-4f90-9692-c27225bfb393 ("coredns-68567cdb47-78x67_kube-system(10011dcf-1fcc-4f90-9692-c27225bfb393)"), skipping: failed to "StartContainer" for "coredns" with CrashLoopBackOff: "back-off 5m0s restarting failed container=coredns pod=coredns-68567cdb47-78x67_kube-system(10011dcf-1fcc-4f90-9692-c27225bfb393)"
[kubelet-dockershim] E0103 18:05:06.595575 114 pod_workers.go:191] Error syncing pod 9ff213d8-36e9-448c-9675-8f344be436fc ("coredns-68567cdb47-xvxnv_kube-system(9ff213d8-36e9-448c-9675-8f344be436fc)"), skipping: failed to "StartContainer" for "coredns" with CrashLoopBackOff: "back-off 5m0s restarting failed container=coredns pod=coredns-68567cdb47-xvxnv_kube-system(9ff213d8-36e9-448c-9675-8f344be436fc)"
[kubelet-dockershim] E0103 18:06:14.151994 114 pod_workers.go:191] Error syncing pod ba5bd6f1-c4af-4f78-93f2-c9a48d126ba9 ("nfs-release-nfs-server-provisioner-0_default(ba5bd6f1-c4af-4f78-93f2-c9a48d126ba9)"), skipping: failed to "StartContainer" for "nfs-server-provisioner" with CrashLoopBackOff: "back-off 5m0s restarting failed container=nfs-server-provisioner pod=nfs-release-nfs-server-provisioner-0_default(ba5bd6f1-c4af-4f78-93f2-c9a48d126ba9)"
Regardless to the DNS issue, an unprivileged user can't mount NFS currently. Probably we need to have some privileged helper daemon for persistent volume stuff. Or maybe we can use some FUSE implementation of NFS.
cc @giuseppe @rhatdan
W.r.t. DNS issue, maybe you should try Rootless mode of k3s. It is based on Usernetes but includes DNS stuff by default.
I would say the best bed for this is a fuse based nfs, since it is not likely that user namespace root is going to be allowed to mount an nfs share any time soon. Also NFS and User Namespace does not work well together if you are going to have multiple a process changing uids inside of an environment. Having UID 1234 chowning a file to UID 5678 is blocked on the server side inside of a user namespace. NFS enforces at the server side and has no concept of USERNS CAP_CHOWN or CAP_DAC_OVERRIDE.
Another potential option would be to setup automounter then the host kernel could mount directories on demand when a containerrized process entered the mount point.