noobaa-operator icon indicating copy to clipboard operation
noobaa-operator copied to clipboard

noobaa-operator going into crashloopbackooff state

Open NupurBharati opened this issue 3 years ago • 6 comments

We have set up a native kubernetes cluster on bare metal server. But when I tried to install Noobaa in it its giving timeout error in the logs of noobaa-operator pod

noobaa-operator pod log time="2020-11-03T07:06:56Z" level=info msg="CLI version: 2.3.0\n" time="2020-11-03T07:06:56Z" level=info msg="noobaa-image: noobaa/noobaa-core:5.5.0\n" time="2020-11-03T07:06:56Z" level=info msg="operator-image: noobaa/noobaa-operator:2.3.0\n" time="2020-11-03T07:07:26Z" level=fatal msg="Failed to become leader: Get \"https://10.96.0.1:443/api?timeout=32s\": dial tcp 10.96.0.1:443: i/o timeout"

noobaa installation output

noobaa install INFO[0000] CLI version: 2.3.0 INFO[0000] noobaa-image: noobaa/noobaa-core:5.5.0 INFO[0000] operator-image: noobaa/noobaa-operator:2.3.0 INFO[0000] Namespace: noobaa INFO[0000] INFO[0000] CRD Create: INFO[0000] ✅ Created: CustomResourceDefinition "noobaas.noobaa.io" INFO[0000] ✅ Created: CustomResourceDefinition "backingstores.noobaa.io" INFO[0000] ✅ Created: CustomResourceDefinition "bucketclasses.noobaa.io" INFO[0000] ✅ Created: CustomResourceDefinition "objectbucketclaims.objectbucket.io" INFO[0000] ✅ Created: CustomResourceDefinition "objectbuckets.objectbucket.io" INFO[0000] INFO[0000] Operator Install: INFO[0000] ✅ Already Exists: Namespace "noobaa" INFO[0000] ✅ Created: ServiceAccount "noobaa" INFO[0000] ✅ Created: Role "noobaa" INFO[0000] ✅ Created: RoleBinding "noobaa" INFO[0000] ✅ Created: ClusterRole "noobaa.noobaa.io" INFO[0000] ✅ Created: ClusterRoleBinding "noobaa.noobaa.io" INFO[0000] ✅ Created: Deployment "noobaa-operator" INFO[0000] INFO[0000] System Create: INFO[0000] ✅ Already Exists: Namespace "noobaa" INFO[0001] ✅ Created: NooBaa "noobaa" INFO[0001] INFO[0001] NOTE: INFO[0001] - This command has finished applying changes to the cluster. INFO[0001] - From now on, it only loops and reads the status, to monitor the operator work. INFO[0001] - You may Ctrl-C at any time to stop the loop and watch it manually. INFO[0001] INFO[0001] System Wait Ready: INFO[0001] ⏳ System Phase is "". Pod "noobaa-operator-c48647fd5-zhbrw" is not yet ready: Phase="Pending". ContainersNotReady (containers with unready status: [noobaa-operator]). ContainersNotReady (containers with unready status: [noobaa-operator]). INFO[0004] ⏳ System Phase is "". StatefulSet "noobaa-core" is not found yet INFO[0007] ⏳ System Phase is "". StatefulSet "noobaa-core" is not found yet INFO[0010] ⏳ System Phase is "". StatefulSet "noobaa-core" is not found yet INFO[0013] ⏳ System Phase is "". StatefulSet "noobaa-core" is not found yet INFO[0016] ⏳ System Phase is "". StatefulSet "noobaa-core" is not found yet INFO[0019] ⏳ System Phase is "". StatefulSet "noobaa-core" is not found yet INFO[0022] ⏳ System Phase is "". StatefulSet "noobaa-core" is not found yet INFO[0025] ⏳ System Phase is "". StatefulSet "noobaa-core" is not found yet INFO[0028] ⏳ System Phase is "". StatefulSet "noobaa-core" is not found yet INFO[0032] ⏳ System Phase is "". StatefulSet "noobaa-core" is not found yet INFO[0034] ⏳ System Phase is "". StatefulSet "noobaa-core" is not found yet INFO[0038] ⏳ System Phase is "". StatefulSet "noobaa-core" is not found yet INFO[0040] ⏳ System Phase is "". StatefulSet "noobaa-core" is not found yet INFO[0043] ⏳ System Phase is "". StatefulSet "noobaa-core" is not found yet INFO[0046] ⏳ System Phase is "". StatefulSet "noobaa-core" is not found yet INFO[0049] ⏳ System Phase is "". StatefulSet "noobaa-core" is not found yet INFO[0052] ⏳ System Phase is "". StatefulSet "noobaa-core" is not found yet INFO[0055] ⏳ System Phase is "". StatefulSet "noobaa-core" is not found yet INFO[0058] ⏳ System Phase is "". StatefulSet "noobaa-core" is not found yet INFO[0061] ⏳ System Phase is "". StatefulSet "noobaa-core" is not found yet INFO[0064] ⏳ System Phase is "". Container "noobaa-operator" is not yet ready: RestartCount=1. Error (). INFO[0068] ⏳ System Phase is "". Container "noobaa-operator" is not yet ready: RestartCount=1. Error (). INFO[0070] ⏳ System Phase is "". Container "noobaa-operator" is not yet ready: RestartCount=1. Error (). INFO[0073] ⏳ System Phase is "". Container "noobaa-operator" is not yet ready: RestartCount=1. Error (). INFO[0076] ⏳ System Phase is "". Container "noobaa-operator" is not yet ready: RestartCount=1. CrashLoopBackOff (back-off 10s restarting failed container=noobaa-operator pod=noobaa-operator-c48647fd5-zhbrw_noobaa(9ff56026-7842-4e73-b012-f91511df8fc6)). INFO[0079] ⏳ System Phase is "". StatefulSet "noobaa-core" is not found yet INFO[0082] ⏳ System Phase is "". StatefulSet "noobaa-core" is not found yet INFO[0085] ⏳ System Phase is "". StatefulSet "noobaa-core" is not found yet INFO[0088] ⏳ System Phase is "". StatefulSet "noobaa-core" is not found yet INFO[0091] ⏳ System Phase is "". StatefulSet "noobaa-core" is not found yet INFO[0094] ⏳ System Phase is "". StatefulSet "noobaa-core" is not found yet INFO[0097] ⏳ System Phase is "". StatefulSet "noobaa-core" is not found yet INFO[0100] ⏳ System Phase is "". StatefulSet "noobaa-core" is not found yet INFO[0103] ⏳ System Phase is "". StatefulSet "noobaa-core" is not found yet INFO[0106] ⏳ System Phase is "". StatefulSet "noobaa-core" is not found yet INFO[0109] ⏳ System Phase is "". Container "noobaa-operator" is not yet ready: RestartCount=2. Error (). INFO[0112] ⏳ System Phase is "". Container "noobaa-operator" is not yet ready: RestartCount=2. Error (). INFO[0115] ⏳ System Phase is "". Container "noobaa-operator" is not yet ready: RestartCount=2. Error (). INFO[0118] ⏳ System Phase is "". Container "noobaa-operator" is not yet ready: RestartCount=2. Error (). INFO[0121] ⏳ System Phase is "". Container "noobaa-operator" is not yet ready: RestartCount=2. CrashLoopBackOff (back-off 20s restarting failed container=noobaa-operator pod=noobaa-operator-c48647fd5-zhbrw_noobaa(9ff56026-7842-4e73-b012-f91511df8fc6)). INFO[0124] ⏳ System Phase is "". Container "noobaa-operator" is not yet ready: RestartCount=2. CrashLoopBackOff (back-off 20s restarting failed container=noobaa-operator pod=noobaa-operator-c48647fd5-zhbrw_noobaa(9ff56026-7842-4e73-b012-f91511df8fc6)). INFO[0127] ⏳ System Phase is "". Container "noobaa-operator" is not yet ready: RestartCount=2. CrashLoopBackOff (back-off 20s restarting failed container=noobaa-operator pod=noobaa-operator-c48647fd5-zhbrw_noobaa(9ff56026-7842-4e73-b012-f91511df8fc6)). INFO[0131] ⏳ System Phase is "". Container "noobaa-operator" is not yet ready: RestartCount=2. CrashLoopBackOff (back-off 20s restarting failed container=noobaa-operator pod=noobaa-operator-c48647fd5-zhbrw_noobaa(9ff56026-7842-4e73-b012-f91511df8fc6)). INFO[0134] ⏳ System Phase is "". StatefulSet "noobaa-core" is not found yet

NupurBharati avatar Nov 03 '20 12:11 NupurBharati

@NupurBharati Thanks for the information! So the error is:

Failed to become leader: Get \"https://10.96.0.1:443/api?timeout=32s\": dial tcp 10.96.0.1:443: i/o timeout

Which kubernetes environment are you using? We are not harcoding these IP's so the api server IP must be coming from the environment to the client-go SDK. You might be able to understand more by running a separate pod/container and check how kubectl cluster-info resolves the api server.

guymguym avatar Nov 04 '20 12:11 guymguym

@guymguym Thanks for the response. Output for the kubectl cluster-info

image

Version of kubernetes - 19.2

NupurBharati avatar Nov 05 '20 05:11 NupurBharati

@NupurBharati Thanks for the info! Is this cluster-info ^ running from inside a pod or from your user env? Because the operator is trying to reach the api server at 10.96.0.1:443 and your kubectl uses 10.95.241.103:6443 which could be the external address.

Next question I would try to answer is if the api-server is accessible from the noobaa-operator pod. One way to test this is to manually patch the noobaa-operator deployment and add a sleep before running the process, so that we can have time to run our manual checks, something like this - I haven't tested it so it might require some debugging:

kubectl patch deployment noobaa-operator --patch '{"spec":{"template":{"spec":{"containers":[{"name":"noobaa-operator", "command":["bash", "-x", "-c", "while true; sleep 60; done"]}] }}}}'

and try to get to the api-server like this:

$ kubectl exec deploy/noobaa-operator -- env | grep KUBERNETES_SERVICE
KUBERNETES_SERVICE_PORT_HTTPS=443
KUBERNETES_SERVICE_PORT=443
KUBERNETES_SERVICE_HOST=172.30.0.1

$ kubectl exec deploy/noobaa-operator -- curl -s --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt https://172.30.0.1:443
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",
  "reason": "Forbidden",
  "details": {

  },
  "code": 403
}

guymguym avatar Nov 12 '20 13:11 guymguym

I have tried this as per above instructions from @guymguym

patching worked oc get pods NAME READY STATUS RESTARTS AGE noobaa-core-0 1/1 Running 0 46h noobaa-db-0 1/1 Running 0 46h noobaa-default-backing-store-noobaa-pod-2b6943b6 1/1 Running 0 46h noobaa-endpoint-84d8688756-l5hl4 1/1 Running 0 46h noobaa-operator-7ff4798d8d-fdtgk 0/1 Terminating 0 12s noobaa-operator-845b54fdb8-kqcjh 1/1 Running 16 46h ocs-metrics-exporter-64797b55c4-rbnml 1/1 Running 0 46h ocs-operator-5f7c888764-kfp57 1/1 Running 15 46h rook-ceph-operator-66479dcbc6-4hfj4 1/1 Running 0 46h

kubectl exec deploy/noobaa-operator -- env | grep KUBERNETES_SERVICE KUBERNETES_SERVICE_HOST=172.30.0.1 KUBERNETES_SERVICE_PORT=443 KUBERNETES_SERVICE_PORT_HTTPS=443

kubectl exec deploy/noobaa-operator -- curl -s --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt https://172.30.0.1:443 { "kind": "Status", "apiVersion": "v1", "metadata": {

}, "status": "Failure", "message": "forbidden: User "system:anonymous" cannot get path "/"", "reason": "Forbidden", "details": {

}, "code": 403 }

noobaa version INFO[0000] CLI version: 5.8.0 INFO[0000] noobaa-image: noobaa/noobaa-core:5.8.0-20210418 INFO[0000] operator-image: noobaa/noobaa-operator:5.8.0

noobaa status I0617 02:27:54.954144 430291 request.go:645] Throttling request took 1.016552198s, request: GET:https://api.rkomandu-hpo.cp.fyre.ibm.com:6443/apis/node.k8s.io/v1beta1?timeout=32s INFO[0001] CLI version: 5.8.0 INFO[0001] noobaa-image: noobaa/noobaa-core:5.8.0-20210418 INFO[0001] operator-image: noobaa/noobaa-operator:master-20210419 INFO[0001] noobaa-db-image: centos/mongodb-36-centos7 INFO[0001] Namespace: openshift-storage

oc version Client Version: 4.7.8 Server Version: 4.7.8 Kubernetes Version: v1.20.0+7d0a2b2

kubectl version Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.1-5-g76a04fc", GitCommit:"95881afb5df065c250d98cf7f30ee4bb6d281acf", GitTreeState:"clean", BuildDate:"2021-04-14T22:34:18Z", GoVersion:"go1.15.7", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.0+7d0a2b2", GitCommit:"7d0a2b269a27413f5f125d30c9d726684886c69a", GitTreeState:"clean", BuildDate:"2021-04-16T13:08:35Z", GoVersion:"go1.15.7", Compiler:"gc", Platform:"linux/amd64"}

rkomandu avatar Jun 17 '21 09:06 rkomandu

@rkomandu So the operator cannot access the api server. Did you change anything regarding the service account / role / cluster role / scc? I would start by comparing those to another deployment that works. also noobaa operator status should show you those resources names and if they exist on the cluster, but it does not check their content is matching the expected, and you can check manually if you get noobaa operator yaml and check the ServiceAccount, Role, RoleBinding, ClusterRole, ClusterRoleBinding. Hope this helps.

guymguym avatar Jun 17 '21 09:06 guymguym

Nothing on the Noobaa front...

noobaa operator status I0617 03:10:31.323965 434891 request.go:645] Throttling request took 1.027234304s, request: GET:https://api.rkomandu-hpo.cp.fyre.ibm.com:6443/apis/flowcontrol.apiserver.k8s.io/v1alpha1?timeout=32s INFO[0001] ✅ Exists: Namespace "openshift-storage" INFO[0001] ✅ Exists: ServiceAccount "noobaa" INFO[0001] ✅ Exists: ServiceAccount "noobaa-endpoint" INFO[0001] ✅ Exists: Role "ocs-operator.v4.8.0-noobaa-557bdbbc79" INFO[0001] ✅ Exists: Role "ocs-operator.v4.8.0-noobaa-endpoint-77889c76b8" INFO[0001] ✅ Exists: RoleBinding "ocs-operator.v4.8.0-noobaa-557bdbbc79" INFO[0001] ✅ Exists: RoleBinding "ocs-operator.v4.8.0-noobaa-endpoint-77889c76b8" INFO[0001] ✅ Exists: ClusterRole "ocs-operator.v4.8.0-788d9dbc57" INFO[0001] ✅ Exists: ClusterRoleBinding "ocs-operator.v4.8.0-788d9dbc57" INFO[0001] ✅ Exists: Deployment "noobaa-operator"

on the yaml, all above are Exists as well.. may be something is still missing to check.. All the noobaa and noobaa-endpoint are present for the noobaa-operator-yaml.txt

rkomandu avatar Jun 17 '21 10:06 rkomandu