glusterd2 icon indicating copy to clipboard operation
glusterd2 copied to clipboard

GlusterD kubernetes: systemctl start glusterd silent failures.

Open jayunit100 opened this issue 5 years ago • 1 comments

Note: I didn't setup an ETCD url. I assume that either way, glusterd should fail fast and obviously if ETCD isnt working, however, its a silent failure.

Observed behavior

Running the kube cluster recipes Gluster pods are running and healthy, but systemctl status glusterd2 tells another story, it completely failed.

Expected/desired behavior

Pods should exit if glusterd can't startup, or at least log this to stderr. Right now no logs and only way to know its broken is to run glustercli peer status or similar inside the pod.

Details on how to reproduce (minimal and precise)

Create the following file:

---
apiVersion: v1
kind: Namespace
metadata:
  name: gluster-storage
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: gluster
  namespace: gluster-storage
  labels:
    gluster-storage: glusterd2
spec:
  selector:
    matchLabels:
      name: glusterd2-daemon
  template:
    metadata:
      labels:
        name: glusterd2-daemon
    spec:
      containers:
        - name: glusterd2
          image: docker.io/gluster/glusterd2-nightly:20190204
# TODO: Enable the below once passing environment variables to the containers is fixed
#          env:
#            - name: GD2_RESTAUTH
#              value: "false"
# Enable if an external etcd cluster has been set up etcd
#            - name: GD2_ETCDENDPOINTS
#              value: "http://gluster-etcd:2379"
# Generate and set a random uuid here
#            - name: GD2_CLUSTER_ID
#              value: "9610ec0b-17e7-405e-82f7-5f78d0b22463"
          securityContext:
            capabilities: {}
            privileged: true
          volumeMounts:
            - name: gluster-dev
              mountPath: "/dev"
            - name: gluster-cgroup
              mountPath: "/sys/fs/cgroup"
              readOnly: true
            - name: gluster-lvm
              mountPath: "/run/lvm"
            - name: gluster-kmods
              mountPath: "/usr/lib/modules"
              readOnly: true

      volumes:
        - name: gluster-dev
          hostPath:
            path: "/dev"
        - name: gluster-cgroup
          hostPath:
            path: "/sys/fs/cgroup"
        - name: gluster-lvm
          hostPath:
            path: "/run/lvm"
        - name: gluster-kmods
          hostPath:
            path: "/usr/lib/modules"

---
apiVersion: v1
kind: Service
metadata:
  name: glusterd2-service
  namespace: gluster-storage
spec:
  selector:
    name: glusterd2-daemon
  ports:
    - protocol: TCP
      port: 24007
      targetPort: 24007
# GD2 will be available on kube-host:31007 externally
      nodePort: 31007
  type: NodePort

And exec -t -i into one of the pods, you'll see its healthy, but running systemctl status glusterd2 will show error logs. re running this command manually, you will then see the following logs

WARNING: 2019/02/04 19:43:51 grpc: addrConn.createTransport failed to connect to {[fe80::345c:baff:fefe:edc6]:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp [fe80::345c:baff:fefe:edc6]:2379: connect: invalid argument". Reconnecting...
WARNING: 2019/02/04 19:43:51 grpc: addrConn.createTransport failed to connect to {[fe80::345c:baff:fefe:edc6]:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp [fe80::345c:baff:fefe:edc6]:2379: connect: invalid argument". Reconnecting...

jayunit100 avatar Feb 04 '19 19:02 jayunit100

@jayunit100 I don't see below part of the code in your template, which is responsible for the health check

livenessProbe:
            httpGet:
              path: /ping
              port: 24007
            initialDelaySeconds: 10
            periodSeconds: 60

please refer https://github.com/gluster/gcs/blob/master/deploy/templates/gcs-manifests/gcs-gd2.yml.j2 for more info

Madhu-1 avatar Feb 05 '19 04:02 Madhu-1