gcs icon indicating copy to clipboard operation
gcs copied to clipboard

Add init container to glusterd2 pod to wait for DNS

Open JohnStrunk opened this issue 7 years ago • 8 comments

In gluster/glusterd2#1324, it looks like DNS isn't ready by the time gd2 tries to resolve its own hostname.

We should add an init container like etcd does to wait for DNS.

Example:

$ kubectl -n gcs get po/etcd-sv9sxbvm7j -oyaml
apiVersion: v1
kind: Pod
metadata:
...
spec:
...
  initContainers:
  - command:
    - /bin/sh
    - -c
    - "\n\t\t\t\t\twhile ( ! nslookup etcd-sv9sxbvm7j.etcd.gcs.svc )\n\t\t\t\t\tdo\n\t\t\t\t\t\tsleep
      2\n\t\t\t\t\tdone"
    image: busybox:1.28.0-glibc
    imagePullPolicy: IfNotPresent
    name: check-dns
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
...

JohnStrunk avatar Nov 13 '18 17:11 JohnStrunk

added the initContainer for the glusterd2 still facing the same issue here is the logs

initContainer

initContainers:
      - name: check-dns
        command: ["/bin/sh","-c","until nslookup gluster-{{ kube_hostname }}-0.glusterd2.{{ gcs_namespace }}.svc.cluster.local; do echo waiting for gluster-{{ kube_hostname }}-0.glusterd2.{{ gcs_namespace }}.svc.cluster.local; sleep 2;done;"]
        image: busybox:1.28.0-glibc
        imagePullPolicy: IfNotPresent
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File

logs from initContainer:

kubectl logs gluster-kube1-0 check-dns -ngcs
Server:    10.233.0.3
Address 1: 10.233.0.3 coredns.kube-system.svc.cluster.local
Name:      gluster-kube1-0.glusterd2.gcs.svc.cluster.local
Address 1: 10.233.64.8 gluster-kube1-0.glusterd2.gcs.svc.cluster.local
  • glusterd2 failed with the same error
time="2018-11-15 06:45:56.004341" level=fatal msg="failed to create gd2-muxsrv listener" error="listen tcp: lookup gluster-kube1-0.glusterd2.gcs on 10.233.0.3:53: no such host" source="[server.go:24:muxsrv.newMuxSrv]"

Madhu-1 avatar Nov 15 '18 07:11 Madhu-1

Does gluster-kube1-0.glusterd2.gcs vs gluster-kube1-0.glusterd2.gcs.svc.cluster.local make a difference?

JohnStrunk avatar Nov 15 '18 16:11 JohnStrunk

@Madhu-1 any progress made on this?

atinmu avatar Nov 30 '18 11:11 atinmu

worked on john proposal, not able to luck. will take a look

Madhu-1 avatar Nov 30 '18 11:11 Madhu-1

Does gluster-kube1-0.glusterd2.gcs vs gluster-kube1-0.glusterd2.gcs.svc.cluster.local make a difference?

@JohnStrunk gluster-kube1-0.glusterd2.gcs is not rechable from initcontainer but gluster-kube1-0.glusterd2.gcs.svc.cluster.local is rechable from initcontainer.

Madhu-1 avatar Nov 30 '18 11:11 Madhu-1

It appears gd2 is using .gcs as opposed to .gcs.svc.cluster.local. Does the init container fix the problem if we wait on the same DNS name as gd2? (hence my initial cryptic question :disappointed:)

JohnStrunk avatar Nov 30 '18 15:11 JohnStrunk

even I tried with other, with .gcs is not reachable from initcontainer (may due to headless services and stateful sets). with .gcs.svc.cluster.local it was able to resolve the DNS name, but it won't work out.

Madhu-1 avatar Dec 05 '18 04:12 Madhu-1

@JohnStrunk am out of ideas, I tried with both initcontainers and pod spec.containers.lifecycle.postStart but no luck.

if i run the same script inside the container its working as expected, it its not working with postStart initcontainer works as expected(nslookup was successful) but still am seeing the issue when pod restarts.

Madhu-1 avatar Dec 07 '18 11:12 Madhu-1