noobaa-operator icon indicating copy to clipboard operation
noobaa-operator copied to clipboard

[BUG] NooBaa is not working in GKE

Open WaterKnight1998 opened this issue 3 years ago β€’ 5 comments

Hi,

I am working with GKE.

Firts I executed:

# Prepare namespace and set as current (optional)
kubectl create ns noobaa
kubectl config set-context --current --namespace noobaa

When I execute noobaa install --mini=true. It keeps in Connecting phase forever:

INFO[0000] CLI version: 2.3.0
INFO[0000] noobaa-image: noobaa/noobaa-core:5.5.0
INFO[0000] operator-image: noobaa/noobaa-operator:2.3.0
INFO[0000] Namespace: noobaa
INFO[0000]
INFO[0000] System Delete:
INFO[0000] πŸ—‘οΈ  Deleting: NooBaa "noobaa"
INFO[0011] πŸ—‘οΈ  Deleted : NooBaa "noobaa"
INFO[0011] πŸ—‘οΈ  Deleting: PersistentVolumeClaim "db-noobaa-db-0"
INFO[0021] πŸ—‘οΈ  Deleted : PersistentVolumeClaim "db-noobaa-db-0"
INFO[0022]
INFO[0022] Operator Delete:
INFO[0022] πŸ—‘οΈ  Deleting: Deployment "noobaa-operator"
INFO[0022] πŸ—‘οΈ  Deleted : Deployment "noobaa-operator"
INFO[0022] πŸ—‘οΈ  Deleting: ClusterRoleBinding "noobaa.noobaa.io"
INFO[0022] πŸ—‘οΈ  Deleted : ClusterRoleBinding "noobaa.noobaa.io"
INFO[0022] πŸ—‘οΈ  Deleting: ClusterRole "noobaa.noobaa.io"
INFO[0022] πŸ—‘οΈ  Deleted : ClusterRole "noobaa.noobaa.io"
INFO[0022] πŸ—‘οΈ  Deleting: RoleBinding "noobaa"
INFO[0023] πŸ—‘οΈ  Deleted : RoleBinding "noobaa"
INFO[0023] πŸ—‘οΈ  Deleting: Role "noobaa"
INFO[0023] πŸ—‘οΈ  Deleted : Role "noobaa"
INFO[0023] πŸ—‘οΈ  Deleting: ServiceAccount "noobaa"
INFO[0023] πŸ—‘οΈ  Deleted : ServiceAccount "noobaa"
INFO[0023] Namespace Delete: currently disabled (enable with "--cleanup")
INFO[0023] Namespace Status:
INFO[0023] βœ… Exists: Namespace "noobaa"
INFO[0023]
INFO[0023] CRD Delete: currently disabled (enable with "--cleanup")
INFO[0023] CRD Status:
INFO[0023] βœ… Exists: CustomResourceDefinition "noobaas.noobaa.io"
INFO[0023] βœ… Exists: CustomResourceDefinition "backingstores.noobaa.io"
INFO[0023] βœ… Exists: CustomResourceDefinition "bucketclasses.noobaa.io"
INFO[0023] βœ… Exists: CustomResourceDefinition "objectbucketclaims.objectbucket.io"
INFO[0023] βœ… Exists: CustomResourceDefinition "objectbuckets.objectbucket.io"
(base) david@T490-PF1XMR5W:/mnt/c/Users/david.lacalle$ noobaa install --mini=true
INFO[0000] CLI version: 2.3.0
INFO[0000] noobaa-image: noobaa/noobaa-core:5.5.0
INFO[0000] operator-image: noobaa/noobaa-operator:2.3.0
INFO[0000] Namespace: noobaa
INFO[0000]
INFO[0000] CRD Create:
INFO[0000] βœ… Already Exists: CustomResourceDefinition "noobaas.noobaa.io"
INFO[0000] βœ… Already Exists: CustomResourceDefinition "backingstores.noobaa.io"
INFO[0000] βœ… Already Exists: CustomResourceDefinition "bucketclasses.noobaa.io"
INFO[0000] βœ… Already Exists: CustomResourceDefinition "objectbucketclaims.objectbucket.io"
INFO[0000] βœ… Already Exists: CustomResourceDefinition "objectbuckets.objectbucket.io"
INFO[0000]
INFO[0000] Operator Install:
INFO[0000] βœ… Already Exists: Namespace "noobaa"
INFO[0000] βœ… Created: ServiceAccount "noobaa"
INFO[0001] βœ… Created: Role "noobaa"
INFO[0001] βœ… Created: RoleBinding "noobaa"
INFO[0001] βœ… Created: ClusterRole "noobaa.noobaa.io"
INFO[0001] βœ… Created: ClusterRoleBinding "noobaa.noobaa.io"
INFO[0001] βœ… Created: Deployment "noobaa-operator"
INFO[0001]
INFO[0001] System Create:
INFO[0001] βœ… Already Exists: Namespace "noobaa"
INFO[0002] βœ… Created: NooBaa "noobaa"
INFO[0002]
INFO[0002] NOTE:
INFO[0002]   - This command has finished applying changes to the cluster.
INFO[0002]   - From now on, it only loops and reads the status, to monitor the operator work.
INFO[0002]   - You may Ctrl-C at any time to stop the loop and watch it manually.
INFO[0002]
INFO[0002] System Wait Ready:
INFO[0002] ⏳ System Phase is "". Pod "noobaa-operator-6b5dbc848-g6p9f" is not yet ready: Phase="Pending". ContainersNotReady (containers with unready status: [noobaa-operator]). ContainersNotReady (containers with unready status: [noobaa-operator]).
INFO[0005] ⏳ System Phase is "". StatefulSet "noobaa-core" is not found yet
INFO[0008] ⏳ System Phase is "". Pod "noobaa-core-0" is not yet ready: Phase="Pending". ContainersNotReady (containers with unready status: [core]). ContainersNotReady (containers with unready status: [core]).
INFO[0011] ⏳ System Phase is "Connecting". Waiting for phase ready ...
INFO[0014] ⏳ System Phase is "Connecting". Waiting for phase ready ...
INFO[0017] ⏳ System Phase is "Connecting". Waiting for phase ready ...
INFO[0020] ⏳ System Phase is "Connecting". Waiting for phase ready ...
INFO[0023] ⏳ System Phase is "Connecting". Waiting for phase ready ...
INFO[0026] ⏳ System Phase is "Connecting". Waiting for phase ready ...
INFO[0029] ⏳ System Phase is "Connecting". Waiting for phase ready ...
INFO[0032] ⏳ System Phase is "Connecting". Waiting for phase ready ...
INFO[0035] ⏳ System Phase is "Connecting". Waiting for phase ready ...
INFO[0038] ⏳ System Phase is "Connecting". Waiting for phase ready ...
INFO[0041] ⏳ System Phase is "Connecting". Waiting for phase ready ...
INFO[0044] ⏳ System Phase is "Connecting". Waiting for phase ready ...
INFO[0047] ⏳ System Phase is "Connecting". Waiting for phase ready ...
INFO[0050] ⏳ System Phase is "Connecting". Waiting for phase ready ...
INFO[0053] ⏳ System Phase is "Connecting". Waiting for phase ready ...
INFO[0056] ⏳ System Phase is "Connecting". Waiting for phase ready ...
INFO[0059] ⏳ System Phase is "Connecting". Waiting for phase ready ...
INFO[0062] ⏳ System Phase is "Connecting". Waiting for phase ready ...
INFO[0065] ⏳ System Phase is "Connecting". Waiting for phase ready ...
INFO[0068] ⏳ System Phase is "Connecting". Waiting for phase ready ...
INFO[0071] ⏳ System Phase is "Connecting". Waiting for phase ready ...
INFO[0074] ⏳ System Phase is "Connecting". Waiting for phase ready ...
INFO[0077] ⏳ System Phase is "Connecting". Waiting for phase ready ...
INFO[0080] ⏳ System Phase is "Connecting". Waiting for phase ready ...
INFO[0083] ⏳ System Phase is "Connecting". Waiting for phase ready ...

noobaa status output

INFO[0000] CLI version: 2.3.0
INFO[0000] noobaa-image: noobaa/noobaa-core:5.5.0
INFO[0000] operator-image: noobaa/noobaa-operator:2.3.0
INFO[0000] Namespace: noobaa
INFO[0000]
INFO[0000] CRD Status:
INFO[0000] βœ… Exists: CustomResourceDefinition "noobaas.noobaa.io"
INFO[0000] βœ… Exists: CustomResourceDefinition "backingstores.noobaa.io"
INFO[0000] βœ… Exists: CustomResourceDefinition "bucketclasses.noobaa.io"
INFO[0000] βœ… Exists: CustomResourceDefinition "objectbucketclaims.objectbucket.io"
INFO[0001] βœ… Exists: CustomResourceDefinition "objectbuckets.objectbucket.io"
INFO[0001]
INFO[0001] Operator Status:
INFO[0001] βœ… Exists: Namespace "noobaa"
INFO[0001] βœ… Exists: ServiceAccount "noobaa"
INFO[0001] βœ… Exists: Role "noobaa"
INFO[0001] βœ… Exists: RoleBinding "noobaa"
INFO[0001] βœ… Exists: ClusterRole "noobaa.noobaa.io"
INFO[0001] βœ… Exists: ClusterRoleBinding "noobaa.noobaa.io"
INFO[0001] βœ… Exists: Deployment "noobaa-operator"
INFO[0001]
INFO[0001] System Status:
INFO[0001] βœ… Exists: NooBaa "noobaa"
INFO[0001] βœ… Exists: StatefulSet "noobaa-core"
INFO[0001] βœ… Exists: StatefulSet "noobaa-db"
INFO[0001] βœ… Exists: Service "noobaa-mgmt"
INFO[0002] βœ… Exists: Service "s3"
INFO[0002] βœ… Exists: Service "noobaa-db"
INFO[0002] βœ… Exists: Secret "noobaa-server"
INFO[0002] ❌ Not Found: Secret "noobaa-operator"
INFO[0002] ❌ Not Found: Secret "noobaa-endpoints"
INFO[0002] ❌ Not Found: Secret "noobaa-admin"
INFO[0002] ❌ Not Found: StorageClass "noobaa.noobaa.io"
INFO[0002] ❌ Not Found: BucketClass "noobaa-default-bucket-class"
INFO[0002] ❌ Not Found: Deployment "noobaa-endpoint"
INFO[0002] ❌ Not Found: HorizontalPodAutoscaler "noobaa-endpoint"
INFO[0002] ⬛ (Optional) Not Found: BackingStore "noobaa-default-backing-store"
INFO[0003] ⬛ (Optional) CRD Unavailable: CredentialsRequest "noobaa-aws-cloud-creds"
INFO[0005] ⬛ (Optional) CRD Unavailable: CredentialsRequest "noobaa-azure-cloud-creds"
INFO[0005] ⬛ (Optional) Not Found: Secret "noobaa-azure-container-creds"
INFO[0007] ⬛ (Optional) CRD Unavailable: PrometheusRule "noobaa-prometheus-rules"
INFO[0008] ⬛ (Optional) CRD Unavailable: ServiceMonitor "noobaa-service-monitor"
INFO[0010] ⬛ (Optional) CRD Unavailable: Route "noobaa-mgmt"
INFO[0012] ⬛ (Optional) CRD Unavailable: Route "s3"
INFO[0012] βœ… Exists: PersistentVolumeClaim "db-noobaa-db-0"
INFO[0012] ❌ System Phase is "Connecting"
INFO[0013] ⏳ System Phase is "Connecting". Waiting for phase ready ...
#------------------#
#- Backing Stores -#
#------------------#

No backing stores found.

#------------------#
#- Bucket Classes -#
#------------------#

No bucket classes found.

#-----------------#
#- Bucket Claims -#
#-----------------#

NAMESPACE        NAME                                BUCKET-NAME           STORAGE-CLASS             BUCKET-CLASS   PHASE
mlflow           ceph-delete-bucket-mlflow           mlflow                rook-ceph-delete-bucket                  Bound
spark-operator   ceph-delete-bucket-spark-operator   spark-operator-data   rook-ceph-delete-bucket                  Bound

I entered to noobaa-core-0 for making pings and print varaibles:

kubectl exec -n noobaa -it noobaa-core-0 bash

Inside here I executed:

ping noobaa-db-0.noobaa-db output

ping: noobaa-db-0.noobaa-db: Name or service not known

ping noobaa-db-0.noobaa-db.pod.cluster.local output

ping: noobaa-db-0.noobaa-db.pod.cluster.local: Name or service not known

env | grep -i db output

grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
NOOBAA_DB_PORT_27017_TCP_PORT=27017
NOOBAA_DB_SERVICE_PORT=27017
NOOBAA_DB_PORT_27017_TCP_ADDR=10.84.10.75
NOOBAA_DB_SERVICE_HOST=10.84.10.75
MONGODB_URL=mongodb://noobaa-db-0.noobaa-db/nbcore
NOOBAA_DB_PORT_27017_TCP=tcp://10.84.10.75:27017
container_dbg=
NOOBAA_DB_PORT_27017_TCP_PROTO=tcp
NOOBAA_DB_SERVICE_PORT_MONGODB=27017
NOOBAA_DB_PORT=tcp://10.84.10.75:27017

Environment:

  • Cloud provider or hardware configuration: GCE 1.17.9-gke.630 with Ubuntu Nodes
  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.8", GitCommit:"9f2892aab98fe339f3bd70e3c470144299398ace", GitTreeState:"clean", BuildDate:"2020-08-13T16:12:48Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17+", GitVersion:"v1.17.9-gke.6300", GitCommit:"eb6985a7ebfd53457b0b91ba08fac07597bb87af", GitTreeState:"clean", BuildDate:"2020-09-15T09:20:11Z", GoVersion:"go1.13.9b4", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): GKE

WaterKnight1998 avatar Oct 01 '20 12:10 WaterKnight1998

Hey @WaterKnight1998 - Many thanks for all that info! Copying the latest update from Slack.

So the question is if the DB pod is not working because of low resources limit that GKE is enforcing. You can try to update the noobaa system CR as in Custom CPU and Memory Resources.

The other option is that the DB is working fine, but on GKE the DNS name noobaa-db-0.noobaa-db that we use to connect to the DB pod is not set in DNS.

I would start by checking if the service DNS name works like this:

$ kubectl exec noobaa-core-0 -- curl -s noobaa-db.noobaa.svc.cluster.local:27017
It looks like you are trying to access MongoDB over HTTP on the native driver port.

And if so we can set env manually (hoping the operator will not override it back to original):

kubectl set env statefulset/noobaa-core MONGODB_URL=mongodb://noobaa-db.noobaa.svc.cluster.local:27017/nbcore
kubectl set env deployment/noobaa-endpoint MONGODB_URL=mongodb://noobaa-db.noobaa.svc.cluster.local:27017/nbcore

If it doesn't work we should look closer on the DB pod and why isn't it working.

guymguym avatar Oct 01 '20 19:10 guymguym

I have tried with a bigger GKE cluster 16vCPU y 64GBRam and same

kubectl exec noobaa-core-0 -- curl noobaa-db.noobaa.svc.cluster.local:27017

Output:

kubectl exec -n noobaa noobaa-core-0 -- curl noobaa-db.noobaa.svc.clu
ster.local:27017
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: (56) Recv failure: Connection reset by peer
command terminated with exit code 56

And if so we can set env manually (hoping the operator will not override it back to original):

kubectl set env statefulset/noobaa-core MONGODB_URL=mongodb://noobaa-db.noobaa.svc.cluster.local:27017/nbcore

statefulset.apps/noobaa-core env updated

kubectl set env deployment/noobaa-endpoint MONGODB_URL=mongodb://noobaa-db.noobaa.svc.cluster.local:27017/nbcore

Error from server (NotFound): deployments.apps "noobaa-endpoint" not found

WaterKnight1998 avatar Oct 02 '20 12:10 WaterKnight1998

Hi, I believe I am hitting this issue. "noobaa install" works fine in a fresh minikube.

In this case I had to downgrade kubernetes to 1.18 to run a (non-noobaa) container that requires 1.18

Attempting β€œnoobaa install”

Kubernetes version

[root@denali2 ~]# kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.5", GitCommit:"e6503f8d8f769ace2f338794c914a96fc335df0f", GitTreeState:"clean", BuildDate:"2020-06-26T03:47:41Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.9", GitCommit:"94f372e501c973a7fa9eb40ec9ebd2fe7ca69848", GitTreeState:"clean", BuildDate:"2020-09-16T13:47:43Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

What is running

[root@denali2 ~]# kubectl get all -o wide --all-namespaces | grep noob

noobaa  pod/noobaa-core-0                           1/1     Running   26         177m    10.244.2.30      denali5   <none>           <none>

noobaa  pod/noobaa-db-0                             1/1     Running   0          177m    10.244.2.31      denali5   <none>           <none>

noobaa  pod/noobaa-operator-75964464cd-qtgp9        1/1     Running   0          178m    10.244.2.29      denali5   <none>           <none>
default service/s3                                  LoadBalancer   10.99.52.20      <pending>     80:30229/TCP,443:31040/TCP,8444:30512/TCP                  23h     noobaa-s3=noobaa

noobaa  service/noobaa-db                           ClusterIP      10.107.63.185    <none>        27017/TCP                                                  177m    noobaa-db=noobaa

noobaa  service/noobaa-mgmt                         LoadBalancer   10.96.243.148    <pending>     80:31740/TCP,443:30389/TCP,8445:31320/TCP,8446:30869/TCP   177m    noobaa-mgmt=noobaa

noobaa  service/s3                                  LoadBalancer   10.100.63.141    <pending>     80:30058/TCP,443:31574/TCP,8444:30162/TCP                  177m    noobaa-s3=noobaa

noobaa  deployment.apps/noobaa-operator             1/1     1            1           178m    
noobaa-operator   noobaa/noobaa-operator:5.5.0-nsfs                                   noobaa-operator=deployment

noobaa  replicaset.apps/noobaa-operator-75964464cd  1         1         1       178m    noobaa-operator   noobaa/noobaa-operator:5.5.0-nsfs                                   noobaa-operator=deployment,pod-template-hash=75964464cd

noobaa  statefulset.apps/noobaa-core                1/1     177m   core                              noobaa/noobaa-core:5.5.0-nsfs

noobaa  statefulset.apps/noobaa-db                  1/1     177m   db                                centos/mongodb-36-centos7

the db pod is listening

2020-10-09T15:20:07.784+0000 I FTDC     [initandlisten] Initializing full-time diagnostic data capture with directory '/data/mongo/cluster/shard1/diagnostic.data'
2020-10-09T15:20:07.784+0000 I NETWORK  [initandlisten] waiting for connections on port 27017

and the core pod is looking... but cannot resolve the name

Oct-9 15:25:53.432 [/17]    [L0] core.util.mongo_client:: _connect: called with mongodb://noobaa-db-0.noobaa-db/nbcore
Oct-9 15:26:33.475 [/17] [ERROR] core.util.mongo_client:: _connect: initial connect failed, will retry failed to connect to server [noobaa-db-0.noobaa-db:27017] on first connect [Error: getaddrinfo ENOTFOUND noobaa-db-0.noobaa-db
    at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:64:26) {
  name: 'MongoNetworkError',
  errorLabels: [Array],
  [Symbol(mongoErrorContextSymbol)]: {}
}]

Here is what the noobaa-core-0 pod is set up with

[root@denali2 ~]# kubectl exec -it noobaa-core-0 bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl kubectl exec [POD] -- [COMMAND] instead.
[noob@noobaa-core-0 /]$ env | grep -i db
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
NOOBAA_DB_PORT_27017_TCP_PORT=27017
NOOBAA_DB_SERVICE_PORT=27017
NOOBAA_DB_PORT_27017_TCP_ADDR=10.107.63.185
NOOBAA_DB_SERVICE_HOST=10.107.63.185
MONGODB_URL=mongodb://noobaa-db-0.noobaa-db/nbcore
NOOBAA_DB_PORT_27017_TCP=tcp://10.107.63.185:27017
container_dbg=
NOOBAA_DB_PORT_27017_TCP_PROTO=tcp
NOOBAA_DB_SERVICE_PORT_MONGODB=27017
NOOBAA_DB_PORT=tcp://10.107.63.185:27017

I confirmed this behavior on both the private build I am using and a prepackaged linux (latest) build

motorman-ibm avatar Oct 09 '20 20:10 motorman-ibm

An update

Updated kubernetes to [root@denali2 ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION denali2 Ready master 15h v1.19.2 denali4 Ready 15h v1.19.2 denali5 Ready 15h v1.19.2

Created NFS PV/PVC

Install went fine - so it seems to be the 1.18 vs. 1.19 (of course based on what I was testing as an overall system I would have preferred to run with 1.18)

motorman-ibm avatar Oct 10 '20 15:10 motorman-ibm

That's good info. thanks @WaterKnight1998 & @motorman-ibm.

So for kube 18, we know that:

  • The Pod address of the DB didn't work (noobaa-db-0.noobaa-db:27017)
  • The Service address of the DB didn't work (noobaa-db.noobaa.svc.cluster.local:27017)
  • From DB logs it seemed that the pod is running and listening.

guymguym avatar Oct 10 '20 21:10 guymguym