mayastor icon indicating copy to clipboard operation
mayastor copied to clipboard

grpc, rest and msp pods stuck in init

Open wibed opened this issue 2 years ago • 5 comments

related to #1150

additional 3 nodes resolved any error messages, but pods still are stuck in init

any advice appreciated

image

wibed avatar Jul 03 '22 05:07 wibed

Reporting similar issue here:

NAME                              READY   STATUS             RESTARTS        AGE
core-agents-6b6bfc75-88pdm        0/1     Init:1/2           0               8m48s
csi-controller-754885db79-9x955   0/3     Init:0/1           0               8m48s
mayastor-bc6kd                    1/1     Running            0               8m48s
mayastor-csi-6pg5v                1/2     CrashLoopBackOff   6 (2m51s ago)   8m48s
mayastor-csi-gfp4r                1/2     CrashLoopBackOff   6 (3m1s ago)    8m47s
mayastor-csi-h4ttr                1/2     CrashLoopBackOff   6 (2m37s ago)   8m47s
mayastor-d88f8                    1/1     Running            0               8m47s
mayastor-etcd-0                   0/1     Pending            0               8m48s
mayastor-etcd-1                   0/1     Pending            0               8m47s
mayastor-etcd-2                   0/1     Pending            0               8m47s
mayastor-slzbx                    1/1     Running            0               8m47s
msp-operator-864dd49b79-rwgm6     0/1     Init:1/2           0               8m48s
nats-0                            2/2     Running            0               8m48s
nats-1                            2/2     Running            0               8m27s
nats-2                            2/2     Running            0               8m17s
rest-765c7c6d5b-h8lxx             0/1     Init:1/2           0               8m48s

I have three 3 mayastor worker nodes.

However, I used to successfully start it once:

NAME                              READY   STATUS             RESTARTS         AGE
core-agents-6b6bfc75-rxwk6        1/1     Running            0                4h1m
csi-controller-754885db79-cwfm9   3/3     Running            0                4h1m
mayastor-csi-gj9v4                1/2     CrashLoopBackOff   51 (4m8s ago)    4h1m
mayastor-csi-kvjbx                1/2     CrashLoopBackOff   51 (4m31s ago)   4h1m
mayastor-csi-qrvf5                1/2     CrashLoopBackOff   51 (3m46s ago)   4h1m
mayastor-etcd-0                   1/1     Running            0                4h1m
mayastor-etcd-1                   1/1     Running            0                4h1m
mayastor-etcd-2                   1/1     Running            0                4h1m
mayastor-h5bxg                    1/1     Running            0                4h1m
mayastor-nkvmg                    1/1     Running            0                4h1m
mayastor-q2l99                    1/1     Running            0                4h1m
msp-operator-864dd49b79-vfltd     1/1     Running            0                4h1m
nats-0                            2/2     Running            0                4h1m
nats-1                            2/2     Running            0                4h1m
nats-2                            2/2     Running            0                4h1m
rest-765c7c6d5b-gk7xv             1/1     Running            0                4h1m

tz-torchai avatar Jul 04 '22 12:07 tz-torchai

@tz-torchai i managed to resolve it by privileging the containers running in the namespace as follows:

apiVersion: v1
kind: Namespace
metadata:
  labels:
    kubernetes.io/metadata.name: mayastor
    pod-security.kubernetes.io/enforce: privileged
  name: mayastor
spec:
  finalizers:
  - kubernetes
status:
  phase: Active

i would like to know if there is a better approach or if i am missing some privileges?

wibed avatar Jul 10 '22 08:07 wibed

@tz-torchai from your first snippet: mayastor-etcd-0 0/1 Pending 0 8m48s mayastor-etcd-1 0/1 Pending 0 8m47s mayastor-etcd-2 0/1 Pending 0 8m47s Etcd is not running which is why the other pod's are in the init phase.

tiagolobocastro avatar Jul 11 '22 09:07 tiagolobocastro

@wibed would you be able to reproduce it again and post a get pods (after a few minutes) ?

tiagolobocastro avatar Jul 11 '22 09:07 tiagolobocastro

I fixed this by removing the following block from each affected pod with init probes.

hostNetwork: true
 # To resolve services in the namespace
dnsPolicy: ClusterFirstWithHostNet

Firaenix avatar Aug 27 '22 13:08 Firaenix