noobaa-operator icon indicating copy to clipboard operation
noobaa-operator copied to clipboard

Operator restarts multiple times on the 0518-nsfs build

Open rkomandu opened this issue 3 years ago • 6 comments

Environment info

  • NooBaa Operator Version: VERSION

  • Platform: Kubernetes 1.14.1 | minikube 1.1.1 | OpenShift 4.1 | other: specify OCP oc version Client Version: 4.7.8 Server Version: 4.7.8 Kubernetes Version: v1.20.0+7d0a2b2

    Noobaa version is posted down

Actual behavior

  1. oc get pods NAME READY STATUS RESTARTS AGE noobaa-core-0 1/1 Running 0 20h noobaa-db-0 1/1 Running 0 20h noobaa-default-backing-store-noobaa-pod-0c22b5a3 0/1 Terminating 0 18h noobaa-endpoint-6bf7d8457d-t82ss 1/1 Running 0 20h noobaa-operator-647bbcf485-gncb7 1/1 Running 22 20h ---> this restarts is a concern

Expected behavior

Steps to reproduce

  1. Installed the 0518-nsfs on the OCP 4.7.8 cluster

More information - Screenshots / Logs / Other output

noobaa version noobaa-operator-restarts-0518nsfsbld.log

INFO[0000] CLI version: 5.8.0 INFO[0000] noobaa-image: noobaa/noobaa-core:master-20210518-nsfs INFO[0000] operator-image: noobaa/noobaa-operator:5.8.0

noobaa status INFO[0001] CLI version: 5.8.0 INFO[0001] noobaa-image: noobaa/noobaa-core:master-20210518-nsfs INFO[0001] operator-image: noobaa/noobaa-operator:master-20210518-nsfs INFO[0001] noobaa-db-image: centos/mongodb-36-centos7 INFO[0001] Namespace: noobaa ....

collected "oc logs noobaa-operator-647bbcf485-gncb7"

let me know if anything further required

rkomandu avatar May 20 '21 07:05 rkomandu

Hey @rkomandu can you please add the failing logs? using --previous flag

romayalon avatar May 20 '21 09:05 romayalon

oc logs --previous=true noobaa-operator-647bbcf485-gncb7 >& /tmp/noobaa-operator-restarts-previous-0518nsfsbld.log

ls -lrt /tmp/noobaa-operator-restarts-previous-0518nsfsbld.log -rw-r--r-- 1 root root 640 May 20 03:23 /tmp/noobaa-operator-restarts-previous-0518nsfsbld.log

cat /tmp/noobaa-operator-restarts-previous-0518nsfsbld.log time="2021-05-20T09:02:42Z" level=info msg="CLI version: 5.8.0\n" time="2021-05-20T09:02:42Z" level=info msg="noobaa-image: noobaa/noobaa-core:master-20210518-nsfs\n" time="2021-05-20T09:02:42Z" level=info msg="operator-image: noobaa/noobaa-operator:5.8.0\n" I0520 09:02:43.134682 1 request.go:645] Throttling request took 1.008213008s, request: GET:https://172.30.0.1:443/apis/objectbucket.io/v1alpha1?timeout=32s time="2021-05-20T09:02:49Z" level=fatal msg="Failed to become leader: Get "https://172.30.0.1:443/api/v1/namespaces/noobaa/pods/noobaa-operator-647bbcf485-gncb7": dial tcp 172.30.0.1:443: connect: connection refused"

i don't think have got any further info in the log and it is very limited in size

rkomandu avatar May 20 '21 13:05 rkomandu

updating the noobaa core log and then the oc describe log noobaa-operator..

at current state the noobaa-operator restarts are about 35 in number.. am not doing any IO anything on the cluster..

noobaa-core.log describe-noobaa-operator-0518-logs.txt

rkomandu avatar May 20 '21 15:05 rkomandu

Updated to the 20210520 build as per discussion with Romy

[root@rkomandu-hpo-inf ~]# oc get pods NAME READY STATUS RESTARTS AGE noobaa-core-0 1/1 Running 0 125m noobaa-db-0 1/1 Running 0 125m noobaa-default-backing-store-noobaa-pod-0c8f0a32 0/1 Terminating 0 3s noobaa-endpoint-877dfcd54-4vwlf 1/1 Running 0 118m noobaa-operator-68bb5bff97-hbpzh 1/1 Running 2 126m --> this is happening

[root@rkomandu-hpo-inf ~]# oc logs --previous noobaa-operator-68bb5bff97-hbpzh > /tmp/noobaa-operator-20Maybuild.log [root@rkomandu-hpo-inf ~]# ls -lrt /tmp/noobaa-operator-20Maybuild.log -rw-r--r-- 1 root root 531 May 21 01:28 /tmp/noobaa-operator-20Maybuild.log [root@rkomandu-hpo-inf ~]# less /tmp/noobaa-operator-20Maybuild.log [root@rkomandu-hpo-inf ~]# noobaa version INFO[0000] CLI version: 5.9.0 INFO[0000] noobaa-image: noobaa/noobaa-core:master-20210520 INFO[0000] operator-image: noobaa/noobaa-operator:5.9.0 [root@rkomandu-hpo-inf ~]# oc version Client Version: 4.7.8 Server Version: 4.7.8 Kubernetes Version: v1.20.0+7d0a2b2 [root@rkomandu-hpo-inf ~]# noobaa status INFO[0001] CLI version: 5.9.0 INFO[0001] noobaa-image: noobaa/noobaa-core:master-20210520 INFO[0001] operator-image: noobaa/noobaa-operator:master-20210520 INFO[0001] noobaa-db-image: centos/mongodb-36-centos7 INFO[0001] Namespace: noobaa

cat /tmp/noobaa-operator-20Maybuild.log time="2021-05-21T07:00:20Z" level=info msg="CLI version: 5.9.0\n" time="2021-05-21T07:00:20Z" level=info msg="noobaa-image: noobaa/noobaa-core:master-20210520\n" time="2021-05-21T07:00:20Z" level=info msg="operator-image: noobaa/noobaa-operator:5.9.0\n" I0521 07:00:21.181798 1 request.go:645] Throttling request took 1.045524803s, request: GET:https://172.30.0.1:443/apis/imageregistry.operator.openshift.io/v1?timeout=32s time="2021-05-21T07:00:37Z" level=fatal msg="Failed to become leader: etcdserver: request timed out"

rkomandu avatar May 21 '21 08:05 rkomandu

@romayalon

I am successful this time with the --previous option..

oc get pods NAME READY STATUS RESTARTS AGE noobaa-core-0 1/1 Running 0 6h noobaa-db-0 1/1 Running 0 6h noobaa-default-backing-store-noobaa-pod-0c8f0a32 0/1 Terminating 0 5s noobaa-endpoint-877dfcd54-4vwlf 1/1 Running 0 5h52m noobaa-operator-68bb5bff97-hbpzh 1/1 Running 4 6h

oc logs --previous=true noobaa-operator-68bb5bff97-hbpzh > /tmp/noobaa-operator-logs-20210520bld

noobaa-operator-logs-20210520bld.log

rkomandu avatar May 21 '21 12:05 rkomandu

similar to #449

nimrod-becker avatar May 25 '21 06:05 nimrod-becker

@rkomandu assuming this is no longer relevant?

nimrod-becker avatar Apr 17 '23 16:04 nimrod-becker

@nimrod-becker , as we are using the ODF builds for the release and as well to the d/s builds, this wouldn't be relevant I suppose as long as the MG is collecting the data accordingly. WDYT ?

rkomandu avatar Apr 17 '23 16:04 rkomandu

agree

nimrod-becker avatar Apr 17 '23 16:04 nimrod-becker