fusion-cloud-native
fusion-cloud-native copied to clipboard
fusion-classic-rest-service pod could not start
Hi,
we are facing issues that fusion-classic-rest-service-0 could not be started and is restarted over and over
fusion-admin-ui-85b6866fbb-hfsxl 1/1 Running 0 26d
fusion-ambassador-8588f45b44-qs976 1/1 Running 0 36d
fusion-api-gateway-5d4cf67975-k56nd 1/1 Running 0 20d
fusion-argo-ui-8db6b5887-2b5tm 1/1 Running 0 36d
fusion-auth-ui-555cfbbf54-qmzqc 1/1 Running 0 26d
fusion-classic-rest-service-0 0/1 Init:CrashLoopBackOff 4699 23d
fusion-devops-ui-6f6c5466bd-r6fvs 1/1 Running 0 26d
fusion-fusion-admin-59cd7d4c96-flxqc 1/1 Running 0 26d
fusion-fusion-indexing-54b6474f57-w7wl7 1/1 Running 26 26d
fusion-fusion-log-forwarder-66bc598c7-wpfss 1/1 Running 0 26d
fusion-insights-6d9cbc5769-p99cj 1/1 Running 0 26d
fusion-job-launcher-5ccc758859-jklbc 1/1 Running 0 26d
fusion-job-rest-server-78897f8886-8kgt8 1/1 Running 0 26d
fusion-ml-model-service-5c4cffd47d-gq5bd 1/1 Running 0 26d
fusion-monitoring-grafana-7f9d5cccf8-6m7bw 1/1 Running 0 36d
fusion-monitoring-prometheus-kube-state-metrics-66f6cc4bb-k8pk7 1/1 Running 0 36d
fusion-monitoring-prometheus-pushgateway-7996489596-r4rs7 1/1 Running 0 36d
fusion-monitoring-prometheus-server-0 2/2 Running 0 36d
fusion-mysql-7b97f56bdc-9rw8s 1/1 Running 0 36d
fusion-pm-ui-747576df49-qqsp6 1/1 Running 0 26d
fusion-pulsar-bookkeeper-0 1/1 Running 0 36d
fusion-pulsar-bookkeeper-1 1/1 Running 0 36d
fusion-pulsar-bookkeeper-2 1/1 Running 0 36d
fusion-pulsar-broker-0 1/1 Running 0 36d
fusion-pulsar-broker-1 1/1 Running 0 36d
fusion-query-pipeline-6dbbf8886c-qswsc 1/1 Running 0 26d
fusion-rest-service-6ffc8f9cc4-ndhw4 1/1 Running 0 26d
fusion-rpc-service-66b5c4885-cjn4j 1/1 Running 0 36d
fusion-rules-ui-9ccb6db59-m74sw 1/1 Running 0 26d
fusion-solr-0 1/1 Running 0 36d
fusion-solr-exporter-6fccf89d5f-4pdxq 1/1 Running 0 36d
fusion-templating-c96f57955-gdh9f 1/1 Running 0 26d
fusion-webapps-69cc458d47-847rj 1/1 Running 0 26d
fusion-workflow-controller-ffc878cc-2scvd 1/1 Running 0 36d
fusion-zookeeper-0 1/1 Running 0 36d
fusion-zookeeper-1 1/1 Running 0 36d
fusion-zookeeper-2 1/1 Running 0 36d
milvus-writable-588d6c755d-w2j8m 1/1 Running 0 36d
seldon-controller-manager-86f68fbcd-dk6db 1/1 Running 6 36d
When I described failing pod (fusion-classic-rest-service-0) and got the information that the one of init containers ("check-zk") fails
Init Containers:
check-zk:
Container ID: containerd://86cbec8dd8bb25ea8239aaa551f28e7bc8164771f911cb29be0a61a79247e5cc
Image: lucidworks/check-fusion-dependency:v1.2.0
Image ID: docker.io/lucidworks/check-fusion-dependency@sha256:9829ccb6a0bea76ac92851b51f8fd8451b7f803019adf27865f093d168a6b19e
Port: <none>
Host Port: <none>
Args:
zookeeper
State: Running
Started: Thu, 08 Apr 2021 14:03:56 +0200
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Thu, 08 Apr 2021 13:56:55 +0200
Finished: Thu, 08 Apr 2021 13:58:55 +0200
Ready: False
Restart Count: 4700
Limits:
cpu: 200m
memory: 32Mi
Requests:
cpu: 200m
memory: 32Mi
Environment:
ZOOKEEPER_CONNECTION_STRING: fusion-zookeeper-0.fusion-zookeeper-headless:2181,fusion-zookeeper-1.fusion-zookeeper-headless:2181,fusion-zookeeper-2.fusion-zookeeper-headless:2181
CHECK_INTERVAL: 5s
CHECK_TIMEOUT: 2s
TIMEOUT: 2m
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from fusion-classic-rest-service-token-hr8sx (ro)
So I got the log from init container
2021/04/08 12:03:56 Check returned error: dial tcp: lookup fusion-zookeeper-2.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:04:01 Check returned error: dial tcp: lookup fusion-zookeeper-1.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:04:06 Check returned error: dial tcp: lookup fusion-zookeeper-1.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:04:11 Check returned error: dial tcp: lookup fusion-zookeeper-0.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:04:16 Check returned error: dial tcp: lookup fusion-zookeeper-2.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:04:21 Check returned error: dial tcp: lookup fusion-zookeeper-2.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:04:26 Check returned error: dial tcp: lookup fusion-zookeeper-0.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:04:31 Check returned error: dial tcp: lookup fusion-zookeeper-0.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:04:36 Check returned error: dial tcp: lookup fusion-zookeeper-0.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:04:41 Check returned error: dial tcp: lookup fusion-zookeeper-2.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:04:46 Check returned error: dial tcp: lookup fusion-zookeeper-0.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:04:51 Check returned error: dial tcp: lookup fusion-zookeeper-1.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:04:56 Check returned error: dial tcp: lookup fusion-zookeeper-0.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:05:01 Check returned error: dial tcp: lookup fusion-zookeeper-0.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:05:06 Check returned error: dial tcp: lookup fusion-zookeeper-0.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:05:11 Check returned error: dial tcp: lookup fusion-zookeeper-0.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:05:16 Check returned error: dial tcp: lookup fusion-zookeeper-0.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:05:21 Check returned error: dial tcp: lookup fusion-zookeeper-0.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:05:26 Check returned error: dial tcp: lookup fusion-zookeeper-1.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:05:31 Check returned error: dial tcp: lookup fusion-zookeeper-1.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:05:36 Check returned error: dial tcp: lookup fusion-zookeeper-1.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:05:41 Check returned error: dial tcp: lookup fusion-zookeeper-2.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:05:46 Check returned error: dial tcp: lookup fusion-zookeeper-0.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:05:51 Check returned error: dial tcp: lookup fusion-zookeeper-1.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:05:56 Error checking zookeeper is running: Timed out waiting for check to complete successfully
Here is a list of all services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
admin ClusterIP 10.237.52.18 <none> 8765/TCP 36d
admin-ui ClusterIP 10.237.60.242 <none> 8080/TCP 36d
auth-ui ClusterIP 10.237.55.132 <none> 8080/TCP 36d
connector-plugin-service ClusterIP 10.237.50.132 <none> 9020/TCP 36d
connectors ClusterIP 10.237.48.90 <none> 9010/TCP 36d
connectors-classic ClusterIP None <none> 9000/TCP 36d
connectors-rpc ClusterIP 10.237.62.128 <none> 8771/TCP 36d
devops-ui ClusterIP 10.237.51.36 <none> 8080/TCP 36d
fusion-ambassador ClusterIP 10.237.60.104 <none> 80/TCP,443/TCP 36d
fusion-argo-ui ClusterIP 10.237.48.220 <none> 2746/TCP 36d
fusion-monitoring-grafana ClusterIP 10.237.61.14 <none> 80/TCP 36d
fusion-monitoring-prometheus-kube-state-metrics ClusterIP None <none> 80/TCP,81/TCP 36d
fusion-monitoring-prometheus-pushgateway ClusterIP 10.237.49.238 <none> 9091/TCP 36d
fusion-monitoring-prometheus-server ClusterIP 10.237.62.189 <none> 80/TCP 36d
fusion-monitoring-prometheus-server-headless ClusterIP None <none> 80/TCP 36d
fusion-mysql ClusterIP 10.237.52.91 <none> 3306/TCP 36d
fusion-pulsar-bookkeeper ClusterIP None <none> 3181/TCP,8000/TCP 36d
fusion-pulsar-broker ClusterIP None <none> 8080/TCP,6650/TCP 36d
fusion-solr-exporter ClusterIP 10.237.54.108 <none> 9983/TCP 36d
fusion-solr-headless ClusterIP None <none> 8983/TCP 36d
fusion-solr-svc ClusterIP 10.237.61.138 <none> 8983/TCP 36d
fusion-zookeeper ClusterIP 10.237.55.91 <none> 2181/TCP,2281/TCP 36d
fusion-zookeeper-headless ClusterIP None <none> 2181/TCP,3888/TCP,2888/TCP,2281/TCP 36d
indexing ClusterIP 10.237.62.46 <none> 8765/TCP 36d
insights ClusterIP 10.237.53.178 <none> 8080/TCP 36d
job-launcher ClusterIP 10.237.50.233 <none> 8083/TCP 36d
job-rest-server ClusterIP 10.237.63.32 <none> 8081/TCP 36d
milvus ClusterIP 10.237.57.195 <none> 19530/TCP,19121/TCP 36d
ml-model-grpc ClusterIP 10.237.63.47 <none> 6565/TCP 36d
ml-model-service ClusterIP 10.237.56.36 <none> 8086/TCP 36d
pm-ui ClusterIP 10.237.61.241 <none> 8080/TCP 36d
proxy LoadBalancer 10.237.56.89 20.50.14.165 6764:31028/TCP 36d
pulsar-broker ClusterIP None <none> 8080/TCP,6650/TCP 36d
query ClusterIP 10.237.50.250 <none> 8787/TCP 36d
rules-ui ClusterIP 10.237.48.49 <none> 8080/TCP 36d
seldon-webhook-service ClusterIP 10.237.53.165 <none> 443/TCP 36d
templating ClusterIP 10.237.54.124 <none> 5250/TCP 36d
webapps ClusterIP 10.237.61.72 <none> 8780/TCP 36d
And a list of endpoints
NAME ENDPOINTS AGE
admin 10.234.1.47:8765 36d
admin-ui 10.234.1.55:8080 36d
auth-ui 10.234.1.40:8080 36d
connector-plugin-service <none> 36d
connectors 10.234.0.132:9010 36d
connectors-classic 36d
connectors-rpc 10.234.1.29:8771 36d
devops-ui 10.234.0.146:8080 36d
fusion-ambassador 10.234.0.151:8443,10.234.0.151:8080 36d
fusion-argo-ui 10.234.1.32:2746 36d
fusion-monitoring-grafana 10.234.0.233:3000 36d
fusion-monitoring-prometheus-kube-state-metrics 10.234.1.42:8081,10.234.1.42:8080 36d
fusion-monitoring-prometheus-pushgateway 10.234.0.139:9091 36d
fusion-monitoring-prometheus-server 10.234.0.145:9090 36d
fusion-monitoring-prometheus-server-headless 10.234.0.145:9090 36d
fusion-mysql 10.234.1.53:3306 36d
fusion-pulsar-bookkeeper 10.234.0.140:8000,10.234.0.246:8000,10.234.1.49:8000 + 3 more... 36d
fusion-pulsar-broker 10.234.0.141:6650,10.234.1.57:6650,10.234.0.141:8080 + 1 more... 36d
fusion-solr-exporter 10.234.1.44:9983 36d
fusion-solr-headless 10.234.0.138:8983 36d
fusion-solr-svc 10.234.0.138:8983 36d
fusion-zookeeper 10.234.0.157:2181,10.234.0.241:2181,10.234.1.43:2181 + 3 more... 36d
fusion-zookeeper-headless 10.234.0.157:2888,10.234.0.241:2888,10.234.1.43:2888 + 9 more... 36d
indexing 10.234.0.149:8765 36d
insights 10.234.0.131:8080 36d
job-launcher 10.234.0.235:8083 36d
job-rest-server 10.234.0.231:8081 36d
milvus 10.234.0.227:19530,10.234.0.227:19121 36d
ml-model-grpc 10.234.0.249:6565 36d
ml-model-service 10.234.0.249:8086 36d
pm-ui 10.234.0.236:8080 36d
proxy 10.234.0.230:6764 36d
pulsar-broker 10.234.0.141:6650,10.234.1.57:6650,10.234.0.141:8080 + 1 more... 36d
query 10.234.1.45:8787 36d
rules-ui 10.234.0.234:8080 36d
seldon-webhook-service 10.234.1.51:443 36d
templating 10.234.0.133:5250 36d
webapps 10.234.0.251:8780 36d
And description of endpoint fusion-zookeper-headless
Name: fusion-zookeeper-headless
Namespace: fusion
Labels: app=zookeeper
app.kubernetes.io/managed-by=Helm
chart=zookeeper-2.4.2
heritage=Helm
release=fusion
service.kubernetes.io/headless=
Annotations: endpoints.kubernetes.io/last-change-trigger-time: 2021-03-02T14:16:05Z
Subsets:
Addresses: 10.234.0.157,10.234.0.241,10.234.1.43
NotReadyAddresses: <none>
Ports:
Name Port Protocol
---- ---- --------
server 2888 TCP
client 2181 TCP
tlsclient 2281 TCP
election 3888 TCP
Events: <none>
and description of service fusion-zookeper-headless
Name: fusion-zookeeper-headless
Namespace: fusion
Labels: app=zookeeper
app.kubernetes.io/managed-by=Helm
chart=zookeeper-2.4.2
heritage=Helm
release=fusion
Annotations: meta.helm.sh/release-name: fusion
meta.helm.sh/release-namespace: fusion
Selector: app=zookeeper,release=fusion
Type: ClusterIP
IP: None
Port: client 2181/TCP
TargetPort: client/TCP
Endpoints: 10.234.0.157:2181,10.234.0.241:2181,10.234.1.43:2181
Port: election 3888/TCP
TargetPort: election/TCP
Endpoints: 10.234.0.157:3888,10.234.0.241:3888,10.234.1.43:3888
Port: server 2888/TCP
TargetPort: server/TCP
Endpoints: 10.234.0.157:2888,10.234.0.241:2888,10.234.1.43:2888
Port: tlsclient 2281/TCP
TargetPort: tlsclient/TCP
Endpoints: 10.234.0.157:2281,10.234.0.241:2281,10.234.1.43:2281
Session Affinity: None
Events: <none>
Can somebody please advice me what is wrong and why the init container of "fusion-classic-rest-service-0" tries to reach fusion-zookeper-headless on such strange IP which differs to IP defined in service fusion-zookeper-headless?