Add init container to glusterd2 pod to wait for DNS
In gluster/glusterd2#1324, it looks like DNS isn't ready by the time gd2 tries to resolve its own hostname.
We should add an init container like etcd does to wait for DNS.
Example:
$ kubectl -n gcs get po/etcd-sv9sxbvm7j -oyaml
apiVersion: v1
kind: Pod
metadata:
...
spec:
...
initContainers:
- command:
- /bin/sh
- -c
- "\n\t\t\t\t\twhile ( ! nslookup etcd-sv9sxbvm7j.etcd.gcs.svc )\n\t\t\t\t\tdo\n\t\t\t\t\t\tsleep
2\n\t\t\t\t\tdone"
image: busybox:1.28.0-glibc
imagePullPolicy: IfNotPresent
name: check-dns
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
...
added the initContainer for the glusterd2 still facing the same issue here is the logs
initContainer
initContainers:
- name: check-dns
command: ["/bin/sh","-c","until nslookup gluster-{{ kube_hostname }}-0.glusterd2.{{ gcs_namespace }}.svc.cluster.local; do echo waiting for gluster-{{ kube_hostname }}-0.glusterd2.{{ gcs_namespace }}.svc.cluster.local; sleep 2;done;"]
image: busybox:1.28.0-glibc
imagePullPolicy: IfNotPresent
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
logs from initContainer:
kubectl logs gluster-kube1-0 check-dns -ngcs
Server: 10.233.0.3
Address 1: 10.233.0.3 coredns.kube-system.svc.cluster.local
Name: gluster-kube1-0.glusterd2.gcs.svc.cluster.local
Address 1: 10.233.64.8 gluster-kube1-0.glusterd2.gcs.svc.cluster.local
- glusterd2 failed with the same error
time="2018-11-15 06:45:56.004341" level=fatal msg="failed to create gd2-muxsrv listener" error="listen tcp: lookup gluster-kube1-0.glusterd2.gcs on 10.233.0.3:53: no such host" source="[server.go:24:muxsrv.newMuxSrv]"
Does gluster-kube1-0.glusterd2.gcs vs gluster-kube1-0.glusterd2.gcs.svc.cluster.local make a difference?
@Madhu-1 any progress made on this?
worked on john proposal, not able to luck. will take a look
Does gluster-kube1-0.glusterd2.gcs vs gluster-kube1-0.glusterd2.gcs.svc.cluster.local make a difference?
@JohnStrunk gluster-kube1-0.glusterd2.gcs is not rechable from initcontainer but gluster-kube1-0.glusterd2.gcs.svc.cluster.local is rechable from initcontainer.
It appears gd2 is using .gcs as opposed to .gcs.svc.cluster.local. Does the init container fix the problem if we wait on the same DNS name as gd2? (hence my initial cryptic question :disappointed:)
even I tried with other, with .gcs is not reachable from initcontainer (may due to headless services and stateful sets).
with .gcs.svc.cluster.local it was able to resolve the DNS name, but it won't work out.
@JohnStrunk am out of ideas, I tried with both initcontainers and pod spec.containers.lifecycle.postStart but no luck.
if i run the same script inside the container its working as expected, it its not working with postStart
initcontainer works as expected(nslookup was successful) but still am seeing the issue when pod restarts.