ceph-helm
ceph-helm copied to clipboard
Unable to mount volumes : timeout expired waiting for volumes to attach/mount
Is this a request for help?: Yes
Is this a BUG REPORT or FEATURE REQUEST? Bug report
Version of Helm and Kubernetes:
kubectl version
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-21T09:17:39Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.7", GitCommit:"dd5e1a2978fd0b97d9b78e1564398aeea7e7fe92", GitTreeState:"clean", BuildDate:"2018-04-18T23:58:35Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
helm version root@kubernetes
Client: &version.Version{SemVer:"v2.9.1", GitCommit:"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.9.1", GitCommit:"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"}
Which chart: ceph-helm
What happened:
Unable to mount volumes for pod "mypod_default(e68c8e3e-6578-11e8-87c4-e83935e84dc8)": timeout expired waiting for volumes to attach/mount for pod "default"/"mypod". list of unattached/unmounted volumes=[vol1]
How to reproduce it (as minimally and precisely as possible): http://docs.ceph.com/docs/master/start/kube-helm/
Anything else we need to know:
The ceph cluster is working fine
ceph -s
cluster:
id: 88596d9e-b478-47a9-8208-3a6cea33d1d4
health: HEALTH_OK
services:
mon: 1 daemons, quorum kubernetes
mgr: kubernetes(active)
mds: cephfs-1/1/1 up {0=mds-ceph-mds-5696f9df5d-jbsgz=up:active}
osd: 1 osds: 1 up, 1 in
rgw: 1 daemon active
data:
pools: 7 pools, 176 pgs
objects: 213 objects, 3391 bytes
usage: 108 MB used, 27134 MB / 27243 MB avail
pgs: 176 active+clean
Everything in th ceph namespace works fine In the mon pod I got an image created for the pvc
rbd ls
kubernetes-dynamic-pvc-0077fdf9-6578-11e8-b1f8-b63c3e9e1eaa
kubectl get pvc root@kubernetes
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
ceph-pvc Bound pvc-c9d07cf9-6578-11e8-87c4-e83935e84dc8 1Gi RWO ceph-rbd 29m
I have changed resolv.conf and added the kube-dns as nameserver, I can resolve ceph-mon.ceph and ceph-mon.ceph.svc.local from the host node
some kubelet logs that I found related
juin 01 11:24:19 kubernetes kubelet[32612]: E0601 11:24:19.587800 32612 nestedpendingoperations.go:263] Operation for "\"kubernetes.io/rbd/[ceph-mon.ceph.svc.cluster.local:6789]:kubernetes-dynamic-pvc-0077fdf9-6578-11e8-b1f8-b63c3e9e1eaa\"" failed. No retries permitted until 2018-06-01 11:24:51.582365588 +0200 CEST m=+162261.330642194 (durationBeforeRetry 32s). Error: "MountVolume.WaitForAttach failed for volume \"pvc-004d66b7-6578-11e8-87c4-e83935e84dc8\" (UniqueName: \"kubernetes.io/rbd/[ceph-mon.ceph.svc.cluster.local:6789]:kubernetes-dynamic-pvc-0077fdf9-6578-11e8-b1f8-b63c3e9e1eaa\") pod \"ldap-ss-0\" (UID: \"f63432e0-6579-11e8-87c4-e83935e84dc8\") : error: exit status 1, rbd output: 2018-06-01 11:19:19.513914 7f1cf1f227c0 -1 did not load config file, using default settings.\n2018-06-01 11:19:19.579955 7f1cf1f20700 0 -- IP@:0/1002573 >> IP@:6789/0 pipe(0x3a2a3f0 sd=3 :53578 s=1 pgs=0 cs=0 l=1 c=0x3a2e6e0).connect protocol feature mismatch, my 83ffffffffffff < peer 481dff8eea4fffb missing 400000000000000\n2018-06-01 11:19:19.580065 7f1cf1f20700 0 -- IP@:0/1002573 >> IP@:6789/0 pipe(0x3a2a3f0 sd=3 :53578 s=1 pgs=0 cs=0 l=1 c=0x3a2e6e0).fault\n2018-06-01 11:19:19.580437 7f1cf1f20700 0 -- IP@:0/1002573 >> 10.1.0.146:6789/0 pipe(0x3a2a3f0 sd=3 :53580 s=1 pgs=0 cs=0 l=1 c=0x3a2e6e0).connect protocol feature mismatch, my 83ffffffffffff < peer 481dff8eea4fffb missing 400000000000000\n2018-06-01 11:19:19.781427 7f1cf1f20700 0 -- 10.1.0.146:0/1002573 >> 10.1.0.146:6789/0 pipe(0x3a2a3f0 sd=3 :53584 s=1 pgs=0 cs=0 l=1 c=0x3a2e6e0).**connect protocol feature mismatch**, my 83ffffffffffff < peer 481dff8eea4fffb missing 400000000000000\n2018-06-01 11:19:20.182401 7f1cf1f20700 0 -- 10.1.0.146:0/1002573 >> 10.1.0.146:6789/0 pipe(0x3a2a3f0 sd=3 :53588 s=1 pgs=0 cs=0 l=1 c=0x3a2e6e0).**connect protocol feature mismatch**, my 83ffffffffffff < peer 481dff8eea4fffb missing 400000000000000\n2018-06-01 11:19:20.983428 7f1cf1f20700 0 -- IP@:0/1002573 >> ip@:6789/0 pipe(0x3a2a3f0 sd=3 :53610 s=1 pgs=0 cs=0 l=1 c=0x3a2e6e0).conne
Idon't know it tries to connect to my kubernetes node externalip:6789 that port is only opened to the ceph-mon headless svc which is
kubectl get svc -n ceph root@kubernetes
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ceph-mon ClusterIP None <none> 6789/TCP 1h
From the kubernetes node I can telnet to the port 6789
telnet ceph-mon.ceph 6789 root@kubernetes
Trying IP@ ...
Connected to ceph-mon.ceph.
connect protocol feature mismatch in the kubelet logs Could have something to do with
Important Kubernetes uses the RBD kernel module to map RBDs to hosts. Luminous requires CRUSH_TUNABLES 5 (Jewel). The minimal kernel version for these tunables is 4.5. If your kernel does not support these tunables, run ceph osd crush tunables hammer
in the ceph-helm doc
and yes it was that you only need to run
ceph osd crush tunables hammer
on the ceph-mon pod.
I will leave this here if anybody else had the same issue :smiley:
/close
Hello @feresberbeche , Thank you for this , very helpful. I was stuck because of my kernel version 4.4.0... The upgrade has solved everything.
Details versions: CEPH: ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)
Kubernetes: Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:53:20Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:43:26Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Linux Kernel: 4.15.0-30-generic