noobaa-operator
noobaa-operator copied to clipboard
[BUG] NooBaa is not working in GKE
Hi,
I am working with GKE.
Firts I executed:
# Prepare namespace and set as current (optional)
kubectl create ns noobaa
kubectl config set-context --current --namespace noobaa
When I execute noobaa install --mini=true
. It keeps in Connecting phase forever:
INFO[0000] CLI version: 2.3.0
INFO[0000] noobaa-image: noobaa/noobaa-core:5.5.0
INFO[0000] operator-image: noobaa/noobaa-operator:2.3.0
INFO[0000] Namespace: noobaa
INFO[0000]
INFO[0000] System Delete:
INFO[0000] ποΈ Deleting: NooBaa "noobaa"
INFO[0011] ποΈ Deleted : NooBaa "noobaa"
INFO[0011] ποΈ Deleting: PersistentVolumeClaim "db-noobaa-db-0"
INFO[0021] ποΈ Deleted : PersistentVolumeClaim "db-noobaa-db-0"
INFO[0022]
INFO[0022] Operator Delete:
INFO[0022] ποΈ Deleting: Deployment "noobaa-operator"
INFO[0022] ποΈ Deleted : Deployment "noobaa-operator"
INFO[0022] ποΈ Deleting: ClusterRoleBinding "noobaa.noobaa.io"
INFO[0022] ποΈ Deleted : ClusterRoleBinding "noobaa.noobaa.io"
INFO[0022] ποΈ Deleting: ClusterRole "noobaa.noobaa.io"
INFO[0022] ποΈ Deleted : ClusterRole "noobaa.noobaa.io"
INFO[0022] ποΈ Deleting: RoleBinding "noobaa"
INFO[0023] ποΈ Deleted : RoleBinding "noobaa"
INFO[0023] ποΈ Deleting: Role "noobaa"
INFO[0023] ποΈ Deleted : Role "noobaa"
INFO[0023] ποΈ Deleting: ServiceAccount "noobaa"
INFO[0023] ποΈ Deleted : ServiceAccount "noobaa"
INFO[0023] Namespace Delete: currently disabled (enable with "--cleanup")
INFO[0023] Namespace Status:
INFO[0023] β
Exists: Namespace "noobaa"
INFO[0023]
INFO[0023] CRD Delete: currently disabled (enable with "--cleanup")
INFO[0023] CRD Status:
INFO[0023] β
Exists: CustomResourceDefinition "noobaas.noobaa.io"
INFO[0023] β
Exists: CustomResourceDefinition "backingstores.noobaa.io"
INFO[0023] β
Exists: CustomResourceDefinition "bucketclasses.noobaa.io"
INFO[0023] β
Exists: CustomResourceDefinition "objectbucketclaims.objectbucket.io"
INFO[0023] β
Exists: CustomResourceDefinition "objectbuckets.objectbucket.io"
(base) david@T490-PF1XMR5W:/mnt/c/Users/david.lacalle$ noobaa install --mini=true
INFO[0000] CLI version: 2.3.0
INFO[0000] noobaa-image: noobaa/noobaa-core:5.5.0
INFO[0000] operator-image: noobaa/noobaa-operator:2.3.0
INFO[0000] Namespace: noobaa
INFO[0000]
INFO[0000] CRD Create:
INFO[0000] β
Already Exists: CustomResourceDefinition "noobaas.noobaa.io"
INFO[0000] β
Already Exists: CustomResourceDefinition "backingstores.noobaa.io"
INFO[0000] β
Already Exists: CustomResourceDefinition "bucketclasses.noobaa.io"
INFO[0000] β
Already Exists: CustomResourceDefinition "objectbucketclaims.objectbucket.io"
INFO[0000] β
Already Exists: CustomResourceDefinition "objectbuckets.objectbucket.io"
INFO[0000]
INFO[0000] Operator Install:
INFO[0000] β
Already Exists: Namespace "noobaa"
INFO[0000] β
Created: ServiceAccount "noobaa"
INFO[0001] β
Created: Role "noobaa"
INFO[0001] β
Created: RoleBinding "noobaa"
INFO[0001] β
Created: ClusterRole "noobaa.noobaa.io"
INFO[0001] β
Created: ClusterRoleBinding "noobaa.noobaa.io"
INFO[0001] β
Created: Deployment "noobaa-operator"
INFO[0001]
INFO[0001] System Create:
INFO[0001] β
Already Exists: Namespace "noobaa"
INFO[0002] β
Created: NooBaa "noobaa"
INFO[0002]
INFO[0002] NOTE:
INFO[0002] - This command has finished applying changes to the cluster.
INFO[0002] - From now on, it only loops and reads the status, to monitor the operator work.
INFO[0002] - You may Ctrl-C at any time to stop the loop and watch it manually.
INFO[0002]
INFO[0002] System Wait Ready:
INFO[0002] β³ System Phase is "". Pod "noobaa-operator-6b5dbc848-g6p9f" is not yet ready: Phase="Pending". ContainersNotReady (containers with unready status: [noobaa-operator]). ContainersNotReady (containers with unready status: [noobaa-operator]).
INFO[0005] β³ System Phase is "". StatefulSet "noobaa-core" is not found yet
INFO[0008] β³ System Phase is "". Pod "noobaa-core-0" is not yet ready: Phase="Pending". ContainersNotReady (containers with unready status: [core]). ContainersNotReady (containers with unready status: [core]).
INFO[0011] β³ System Phase is "Connecting". Waiting for phase ready ...
INFO[0014] β³ System Phase is "Connecting". Waiting for phase ready ...
INFO[0017] β³ System Phase is "Connecting". Waiting for phase ready ...
INFO[0020] β³ System Phase is "Connecting". Waiting for phase ready ...
INFO[0023] β³ System Phase is "Connecting". Waiting for phase ready ...
INFO[0026] β³ System Phase is "Connecting". Waiting for phase ready ...
INFO[0029] β³ System Phase is "Connecting". Waiting for phase ready ...
INFO[0032] β³ System Phase is "Connecting". Waiting for phase ready ...
INFO[0035] β³ System Phase is "Connecting". Waiting for phase ready ...
INFO[0038] β³ System Phase is "Connecting". Waiting for phase ready ...
INFO[0041] β³ System Phase is "Connecting". Waiting for phase ready ...
INFO[0044] β³ System Phase is "Connecting". Waiting for phase ready ...
INFO[0047] β³ System Phase is "Connecting". Waiting for phase ready ...
INFO[0050] β³ System Phase is "Connecting". Waiting for phase ready ...
INFO[0053] β³ System Phase is "Connecting". Waiting for phase ready ...
INFO[0056] β³ System Phase is "Connecting". Waiting for phase ready ...
INFO[0059] β³ System Phase is "Connecting". Waiting for phase ready ...
INFO[0062] β³ System Phase is "Connecting". Waiting for phase ready ...
INFO[0065] β³ System Phase is "Connecting". Waiting for phase ready ...
INFO[0068] β³ System Phase is "Connecting". Waiting for phase ready ...
INFO[0071] β³ System Phase is "Connecting". Waiting for phase ready ...
INFO[0074] β³ System Phase is "Connecting". Waiting for phase ready ...
INFO[0077] β³ System Phase is "Connecting". Waiting for phase ready ...
INFO[0080] β³ System Phase is "Connecting". Waiting for phase ready ...
INFO[0083] β³ System Phase is "Connecting". Waiting for phase ready ...
noobaa status
output
INFO[0000] CLI version: 2.3.0
INFO[0000] noobaa-image: noobaa/noobaa-core:5.5.0
INFO[0000] operator-image: noobaa/noobaa-operator:2.3.0
INFO[0000] Namespace: noobaa
INFO[0000]
INFO[0000] CRD Status:
INFO[0000] β
Exists: CustomResourceDefinition "noobaas.noobaa.io"
INFO[0000] β
Exists: CustomResourceDefinition "backingstores.noobaa.io"
INFO[0000] β
Exists: CustomResourceDefinition "bucketclasses.noobaa.io"
INFO[0000] β
Exists: CustomResourceDefinition "objectbucketclaims.objectbucket.io"
INFO[0001] β
Exists: CustomResourceDefinition "objectbuckets.objectbucket.io"
INFO[0001]
INFO[0001] Operator Status:
INFO[0001] β
Exists: Namespace "noobaa"
INFO[0001] β
Exists: ServiceAccount "noobaa"
INFO[0001] β
Exists: Role "noobaa"
INFO[0001] β
Exists: RoleBinding "noobaa"
INFO[0001] β
Exists: ClusterRole "noobaa.noobaa.io"
INFO[0001] β
Exists: ClusterRoleBinding "noobaa.noobaa.io"
INFO[0001] β
Exists: Deployment "noobaa-operator"
INFO[0001]
INFO[0001] System Status:
INFO[0001] β
Exists: NooBaa "noobaa"
INFO[0001] β
Exists: StatefulSet "noobaa-core"
INFO[0001] β
Exists: StatefulSet "noobaa-db"
INFO[0001] β
Exists: Service "noobaa-mgmt"
INFO[0002] β
Exists: Service "s3"
INFO[0002] β
Exists: Service "noobaa-db"
INFO[0002] β
Exists: Secret "noobaa-server"
INFO[0002] β Not Found: Secret "noobaa-operator"
INFO[0002] β Not Found: Secret "noobaa-endpoints"
INFO[0002] β Not Found: Secret "noobaa-admin"
INFO[0002] β Not Found: StorageClass "noobaa.noobaa.io"
INFO[0002] β Not Found: BucketClass "noobaa-default-bucket-class"
INFO[0002] β Not Found: Deployment "noobaa-endpoint"
INFO[0002] β Not Found: HorizontalPodAutoscaler "noobaa-endpoint"
INFO[0002] β¬ (Optional) Not Found: BackingStore "noobaa-default-backing-store"
INFO[0003] β¬ (Optional) CRD Unavailable: CredentialsRequest "noobaa-aws-cloud-creds"
INFO[0005] β¬ (Optional) CRD Unavailable: CredentialsRequest "noobaa-azure-cloud-creds"
INFO[0005] β¬ (Optional) Not Found: Secret "noobaa-azure-container-creds"
INFO[0007] β¬ (Optional) CRD Unavailable: PrometheusRule "noobaa-prometheus-rules"
INFO[0008] β¬ (Optional) CRD Unavailable: ServiceMonitor "noobaa-service-monitor"
INFO[0010] β¬ (Optional) CRD Unavailable: Route "noobaa-mgmt"
INFO[0012] β¬ (Optional) CRD Unavailable: Route "s3"
INFO[0012] β
Exists: PersistentVolumeClaim "db-noobaa-db-0"
INFO[0012] β System Phase is "Connecting"
INFO[0013] β³ System Phase is "Connecting". Waiting for phase ready ...
#------------------#
#- Backing Stores -#
#------------------#
No backing stores found.
#------------------#
#- Bucket Classes -#
#------------------#
No bucket classes found.
#-----------------#
#- Bucket Claims -#
#-----------------#
NAMESPACE NAME BUCKET-NAME STORAGE-CLASS BUCKET-CLASS PHASE
mlflow ceph-delete-bucket-mlflow mlflow rook-ceph-delete-bucket Bound
spark-operator ceph-delete-bucket-spark-operator spark-operator-data rook-ceph-delete-bucket Bound
I entered to noobaa-core-0 for making pings and print varaibles:
kubectl exec -n noobaa -it noobaa-core-0 bash
Inside here I executed:
ping noobaa-db-0.noobaa-db
output
ping: noobaa-db-0.noobaa-db: Name or service not known
ping noobaa-db-0.noobaa-db.pod.cluster.local
output
ping: noobaa-db-0.noobaa-db.pod.cluster.local: Name or service not known
env | grep -i db
output
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
NOOBAA_DB_PORT_27017_TCP_PORT=27017
NOOBAA_DB_SERVICE_PORT=27017
NOOBAA_DB_PORT_27017_TCP_ADDR=10.84.10.75
NOOBAA_DB_SERVICE_HOST=10.84.10.75
MONGODB_URL=mongodb://noobaa-db-0.noobaa-db/nbcore
NOOBAA_DB_PORT_27017_TCP=tcp://10.84.10.75:27017
container_dbg=
NOOBAA_DB_PORT_27017_TCP_PROTO=tcp
NOOBAA_DB_SERVICE_PORT_MONGODB=27017
NOOBAA_DB_PORT=tcp://10.84.10.75:27017
Environment:
- Cloud provider or hardware configuration: GCE 1.17.9-gke.630 with Ubuntu Nodes
- Kubernetes version (use
kubectl version
):
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.8", GitCommit:"9f2892aab98fe339f3bd70e3c470144299398ace", GitTreeState:"clean", BuildDate:"2020-08-13T16:12:48Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17+", GitVersion:"v1.17.9-gke.6300", GitCommit:"eb6985a7ebfd53457b0b91ba08fac07597bb87af", GitTreeState:"clean", BuildDate:"2020-09-15T09:20:11Z", GoVersion:"go1.13.9b4", Compiler:"gc", Platform:"linux/amd64"}
- Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): GKE
Hey @WaterKnight1998 - Many thanks for all that info! Copying the latest update from Slack.
So the question is if the DB pod is not working because of low resources limit that GKE is enforcing. You can try to update the noobaa system CR as in Custom CPU and Memory Resources.
The other option is that the DB is working fine, but on GKE the DNS name noobaa-db-0.noobaa-db
that we use to connect to the DB pod is not set in DNS.
I would start by checking if the service DNS name works like this:
$ kubectl exec noobaa-core-0 -- curl -s noobaa-db.noobaa.svc.cluster.local:27017
It looks like you are trying to access MongoDB over HTTP on the native driver port.
And if so we can set env manually (hoping the operator will not override it back to original):
kubectl set env statefulset/noobaa-core MONGODB_URL=mongodb://noobaa-db.noobaa.svc.cluster.local:27017/nbcore
kubectl set env deployment/noobaa-endpoint MONGODB_URL=mongodb://noobaa-db.noobaa.svc.cluster.local:27017/nbcore
If it doesn't work we should look closer on the DB pod and why isn't it working.
I have tried with a bigger GKE cluster 16vCPU y 64GBRam and same
kubectl exec noobaa-core-0 -- curl noobaa-db.noobaa.svc.cluster.local:27017
Output:
kubectl exec -n noobaa noobaa-core-0 -- curl noobaa-db.noobaa.svc.clu
ster.local:27017
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
curl: (56) Recv failure: Connection reset by peer
command terminated with exit code 56
And if so we can set env manually (hoping the operator will not override it back to original):
kubectl set env statefulset/noobaa-core MONGODB_URL=mongodb://noobaa-db.noobaa.svc.cluster.local:27017/nbcore
statefulset.apps/noobaa-core env updated
kubectl set env deployment/noobaa-endpoint MONGODB_URL=mongodb://noobaa-db.noobaa.svc.cluster.local:27017/nbcore
Error from server (NotFound): deployments.apps "noobaa-endpoint" not found
Hi, I believe I am hitting this issue. "noobaa install" works fine in a fresh minikube.
In this case I had to downgrade kubernetes to 1.18 to run a (non-noobaa) container that requires 1.18
Attempting βnoobaa installβ
Kubernetes version
[root@denali2 ~]# kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.5", GitCommit:"e6503f8d8f769ace2f338794c914a96fc335df0f", GitTreeState:"clean", BuildDate:"2020-06-26T03:47:41Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.9", GitCommit:"94f372e501c973a7fa9eb40ec9ebd2fe7ca69848", GitTreeState:"clean", BuildDate:"2020-09-16T13:47:43Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
What is running
[root@denali2 ~]# kubectl get all -o wide --all-namespaces | grep noob
noobaa pod/noobaa-core-0 1/1 Running 26 177m 10.244.2.30 denali5 <none> <none>
noobaa pod/noobaa-db-0 1/1 Running 0 177m 10.244.2.31 denali5 <none> <none>
noobaa pod/noobaa-operator-75964464cd-qtgp9 1/1 Running 0 178m 10.244.2.29 denali5 <none> <none>
default service/s3 LoadBalancer 10.99.52.20 <pending> 80:30229/TCP,443:31040/TCP,8444:30512/TCP 23h noobaa-s3=noobaa
noobaa service/noobaa-db ClusterIP 10.107.63.185 <none> 27017/TCP 177m noobaa-db=noobaa
noobaa service/noobaa-mgmt LoadBalancer 10.96.243.148 <pending> 80:31740/TCP,443:30389/TCP,8445:31320/TCP,8446:30869/TCP 177m noobaa-mgmt=noobaa
noobaa service/s3 LoadBalancer 10.100.63.141 <pending> 80:30058/TCP,443:31574/TCP,8444:30162/TCP 177m noobaa-s3=noobaa
noobaa deployment.apps/noobaa-operator 1/1 1 1 178m
noobaa-operator noobaa/noobaa-operator:5.5.0-nsfs noobaa-operator=deployment
noobaa replicaset.apps/noobaa-operator-75964464cd 1 1 1 178m noobaa-operator noobaa/noobaa-operator:5.5.0-nsfs noobaa-operator=deployment,pod-template-hash=75964464cd
noobaa statefulset.apps/noobaa-core 1/1 177m core noobaa/noobaa-core:5.5.0-nsfs
noobaa statefulset.apps/noobaa-db 1/1 177m db centos/mongodb-36-centos7
the db pod is listening
2020-10-09T15:20:07.784+0000 I FTDC [initandlisten] Initializing full-time diagnostic data capture with directory '/data/mongo/cluster/shard1/diagnostic.data'
2020-10-09T15:20:07.784+0000 I NETWORK [initandlisten] waiting for connections on port 27017
and the core pod is looking... but cannot resolve the name
Oct-9 15:25:53.432 [/17] [L0] core.util.mongo_client:: _connect: called with mongodb://noobaa-db-0.noobaa-db/nbcore
Oct-9 15:26:33.475 [/17] [ERROR] core.util.mongo_client:: _connect: initial connect failed, will retry failed to connect to server [noobaa-db-0.noobaa-db:27017] on first connect [Error: getaddrinfo ENOTFOUND noobaa-db-0.noobaa-db
at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:64:26) {
name: 'MongoNetworkError',
errorLabels: [Array],
[Symbol(mongoErrorContextSymbol)]: {}
}]
Here is what the noobaa-core-0 pod is set up with
[root@denali2 ~]# kubectl exec -it noobaa-core-0 bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl kubectl exec [POD] -- [COMMAND] instead.
[noob@noobaa-core-0 /]$ env | grep -i db
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
NOOBAA_DB_PORT_27017_TCP_PORT=27017
NOOBAA_DB_SERVICE_PORT=27017
NOOBAA_DB_PORT_27017_TCP_ADDR=10.107.63.185
NOOBAA_DB_SERVICE_HOST=10.107.63.185
MONGODB_URL=mongodb://noobaa-db-0.noobaa-db/nbcore
NOOBAA_DB_PORT_27017_TCP=tcp://10.107.63.185:27017
container_dbg=
NOOBAA_DB_PORT_27017_TCP_PROTO=tcp
NOOBAA_DB_SERVICE_PORT_MONGODB=27017
NOOBAA_DB_PORT=tcp://10.107.63.185:27017
I confirmed this behavior on both the private build I am using and a prepackaged linux (latest) build
An update
Updated kubernetes to
[root@denali2 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
denali2 Ready master 15h v1.19.2
denali4 Ready
Created NFS PV/PVC
Install went fine - so it seems to be the 1.18 vs. 1.19 (of course based on what I was testing as an overall system I would have preferred to run with 1.18)
That's good info. thanks @WaterKnight1998 & @motorman-ibm.
So for kube 18, we know that:
- The Pod address of the DB didn't work (
noobaa-db-0.noobaa-db:27017
) - The Service address of the DB didn't work (
noobaa-db.noobaa.svc.cluster.local:27017
) - From DB logs it seemed that the pod is running and listening.