fusion-cloud-native icon indicating copy to clipboard operation
fusion-cloud-native copied to clipboard

fusion-classic-rest-service pod could not start

Open cermakp opened this issue 3 years ago • 0 comments

Hi,

we are facing issues that fusion-classic-rest-service-0 could not be started and is restarted over and over

fusion-admin-ui-85b6866fbb-hfsxl                                  1/1     Running                 0          26d
fusion-ambassador-8588f45b44-qs976                                1/1     Running                 0          36d
fusion-api-gateway-5d4cf67975-k56nd                               1/1     Running                 0          20d
fusion-argo-ui-8db6b5887-2b5tm                                    1/1     Running                 0          36d
fusion-auth-ui-555cfbbf54-qmzqc                                   1/1     Running                 0          26d
fusion-classic-rest-service-0                                     0/1     Init:CrashLoopBackOff   4699       23d
fusion-devops-ui-6f6c5466bd-r6fvs                                 1/1     Running                 0          26d
fusion-fusion-admin-59cd7d4c96-flxqc                              1/1     Running                 0          26d
fusion-fusion-indexing-54b6474f57-w7wl7                           1/1     Running                 26         26d
fusion-fusion-log-forwarder-66bc598c7-wpfss                       1/1     Running                 0          26d
fusion-insights-6d9cbc5769-p99cj                                  1/1     Running                 0          26d
fusion-job-launcher-5ccc758859-jklbc                              1/1     Running                 0          26d
fusion-job-rest-server-78897f8886-8kgt8                           1/1     Running                 0          26d
fusion-ml-model-service-5c4cffd47d-gq5bd                          1/1     Running                 0          26d
fusion-monitoring-grafana-7f9d5cccf8-6m7bw                        1/1     Running                 0          36d
fusion-monitoring-prometheus-kube-state-metrics-66f6cc4bb-k8pk7   1/1     Running                 0          36d
fusion-monitoring-prometheus-pushgateway-7996489596-r4rs7         1/1     Running                 0          36d
fusion-monitoring-prometheus-server-0                             2/2     Running                 0          36d
fusion-mysql-7b97f56bdc-9rw8s                                     1/1     Running                 0          36d
fusion-pm-ui-747576df49-qqsp6                                     1/1     Running                 0          26d
fusion-pulsar-bookkeeper-0                                        1/1     Running                 0          36d
fusion-pulsar-bookkeeper-1                                        1/1     Running                 0          36d
fusion-pulsar-bookkeeper-2                                        1/1     Running                 0          36d
fusion-pulsar-broker-0                                            1/1     Running                 0          36d
fusion-pulsar-broker-1                                            1/1     Running                 0          36d
fusion-query-pipeline-6dbbf8886c-qswsc                            1/1     Running                 0          26d
fusion-rest-service-6ffc8f9cc4-ndhw4                              1/1     Running                 0          26d
fusion-rpc-service-66b5c4885-cjn4j                                1/1     Running                 0          36d
fusion-rules-ui-9ccb6db59-m74sw                                   1/1     Running                 0          26d
fusion-solr-0                                                     1/1     Running                 0          36d
fusion-solr-exporter-6fccf89d5f-4pdxq                             1/1     Running                 0          36d
fusion-templating-c96f57955-gdh9f                                 1/1     Running                 0          26d
fusion-webapps-69cc458d47-847rj                                   1/1     Running                 0          26d
fusion-workflow-controller-ffc878cc-2scvd                         1/1     Running                 0          36d
fusion-zookeeper-0                                                1/1     Running                 0          36d
fusion-zookeeper-1                                                1/1     Running                 0          36d
fusion-zookeeper-2                                                1/1     Running                 0          36d
milvus-writable-588d6c755d-w2j8m                                  1/1     Running                 0          36d
seldon-controller-manager-86f68fbcd-dk6db                         1/1     Running                 6          36d

When I described failing pod (fusion-classic-rest-service-0) and got the information that the one of init containers ("check-zk") fails

Init Containers:
  check-zk:
    Container ID:  containerd://86cbec8dd8bb25ea8239aaa551f28e7bc8164771f911cb29be0a61a79247e5cc
    Image:         lucidworks/check-fusion-dependency:v1.2.0
    Image ID:      docker.io/lucidworks/check-fusion-dependency@sha256:9829ccb6a0bea76ac92851b51f8fd8451b7f803019adf27865f093d168a6b19e
    Port:          <none>
    Host Port:     <none>
    Args:
      zookeeper
    State:          Running
      Started:      Thu, 08 Apr 2021 14:03:56 +0200
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Thu, 08 Apr 2021 13:56:55 +0200
      Finished:     Thu, 08 Apr 2021 13:58:55 +0200
    Ready:          False
    Restart Count:  4700
    Limits:
      cpu:     200m
      memory:  32Mi
    Requests:
      cpu:     200m
      memory:  32Mi
    Environment:
      ZOOKEEPER_CONNECTION_STRING:  fusion-zookeeper-0.fusion-zookeeper-headless:2181,fusion-zookeeper-1.fusion-zookeeper-headless:2181,fusion-zookeeper-2.fusion-zookeeper-headless:2181
      CHECK_INTERVAL:               5s
      CHECK_TIMEOUT:                2s
      TIMEOUT:                      2m
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from fusion-classic-rest-service-token-hr8sx (ro)

So I got the log from init container

2021/04/08 12:03:56 Check returned error: dial tcp: lookup fusion-zookeeper-2.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:04:01 Check returned error: dial tcp: lookup fusion-zookeeper-1.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:04:06 Check returned error: dial tcp: lookup fusion-zookeeper-1.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:04:11 Check returned error: dial tcp: lookup fusion-zookeeper-0.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:04:16 Check returned error: dial tcp: lookup fusion-zookeeper-2.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:04:21 Check returned error: dial tcp: lookup fusion-zookeeper-2.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:04:26 Check returned error: dial tcp: lookup fusion-zookeeper-0.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:04:31 Check returned error: dial tcp: lookup fusion-zookeeper-0.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:04:36 Check returned error: dial tcp: lookup fusion-zookeeper-0.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:04:41 Check returned error: dial tcp: lookup fusion-zookeeper-2.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:04:46 Check returned error: dial tcp: lookup fusion-zookeeper-0.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:04:51 Check returned error: dial tcp: lookup fusion-zookeeper-1.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:04:56 Check returned error: dial tcp: lookup fusion-zookeeper-0.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:05:01 Check returned error: dial tcp: lookup fusion-zookeeper-0.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:05:06 Check returned error: dial tcp: lookup fusion-zookeeper-0.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:05:11 Check returned error: dial tcp: lookup fusion-zookeeper-0.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:05:16 Check returned error: dial tcp: lookup fusion-zookeeper-0.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:05:21 Check returned error: dial tcp: lookup fusion-zookeeper-0.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:05:26 Check returned error: dial tcp: lookup fusion-zookeeper-1.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:05:31 Check returned error: dial tcp: lookup fusion-zookeeper-1.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:05:36 Check returned error: dial tcp: lookup fusion-zookeeper-1.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:05:41 Check returned error: dial tcp: lookup fusion-zookeeper-2.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:05:46 Check returned error: dial tcp: lookup fusion-zookeeper-0.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:05:51 Check returned error: dial tcp: lookup fusion-zookeeper-1.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:05:56 Error checking zookeeper is running: Timed out waiting for check to complete successfully

Here is a list of all services

NAME                                              TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)                               AGE
admin                                             ClusterIP      10.237.52.18    <none>         8765/TCP                              36d
admin-ui                                          ClusterIP      10.237.60.242   <none>         8080/TCP                              36d
auth-ui                                           ClusterIP      10.237.55.132   <none>         8080/TCP                              36d
connector-plugin-service                          ClusterIP      10.237.50.132   <none>         9020/TCP                              36d
connectors                                        ClusterIP      10.237.48.90    <none>         9010/TCP                              36d
connectors-classic                                ClusterIP      None            <none>         9000/TCP                              36d
connectors-rpc                                    ClusterIP      10.237.62.128   <none>         8771/TCP                              36d
devops-ui                                         ClusterIP      10.237.51.36    <none>         8080/TCP                              36d
fusion-ambassador                                 ClusterIP      10.237.60.104   <none>         80/TCP,443/TCP                        36d
fusion-argo-ui                                    ClusterIP      10.237.48.220   <none>         2746/TCP                              36d
fusion-monitoring-grafana                         ClusterIP      10.237.61.14    <none>         80/TCP                                36d
fusion-monitoring-prometheus-kube-state-metrics   ClusterIP      None            <none>         80/TCP,81/TCP                         36d
fusion-monitoring-prometheus-pushgateway          ClusterIP      10.237.49.238   <none>         9091/TCP                              36d
fusion-monitoring-prometheus-server               ClusterIP      10.237.62.189   <none>         80/TCP                                36d
fusion-monitoring-prometheus-server-headless      ClusterIP      None            <none>         80/TCP                                36d
fusion-mysql                                      ClusterIP      10.237.52.91    <none>         3306/TCP                              36d
fusion-pulsar-bookkeeper                          ClusterIP      None            <none>         3181/TCP,8000/TCP                     36d
fusion-pulsar-broker                              ClusterIP      None            <none>         8080/TCP,6650/TCP                     36d
fusion-solr-exporter                              ClusterIP      10.237.54.108   <none>         9983/TCP                              36d
fusion-solr-headless                              ClusterIP      None            <none>         8983/TCP                              36d
fusion-solr-svc                                   ClusterIP      10.237.61.138   <none>         8983/TCP                              36d
fusion-zookeeper                                  ClusterIP      10.237.55.91    <none>         2181/TCP,2281/TCP                     36d
fusion-zookeeper-headless                         ClusterIP      None            <none>         2181/TCP,3888/TCP,2888/TCP,2281/TCP   36d
indexing                                          ClusterIP      10.237.62.46    <none>         8765/TCP                              36d
insights                                          ClusterIP      10.237.53.178   <none>         8080/TCP                              36d
job-launcher                                      ClusterIP      10.237.50.233   <none>         8083/TCP                              36d
job-rest-server                                   ClusterIP      10.237.63.32    <none>         8081/TCP                              36d
milvus                                            ClusterIP      10.237.57.195   <none>         19530/TCP,19121/TCP                   36d
ml-model-grpc                                     ClusterIP      10.237.63.47    <none>         6565/TCP                              36d
ml-model-service                                  ClusterIP      10.237.56.36    <none>         8086/TCP                              36d
pm-ui                                             ClusterIP      10.237.61.241   <none>         8080/TCP                              36d
proxy                                             LoadBalancer   10.237.56.89    20.50.14.165   6764:31028/TCP                        36d
pulsar-broker                                     ClusterIP      None            <none>         8080/TCP,6650/TCP                     36d
query                                             ClusterIP      10.237.50.250   <none>         8787/TCP                              36d
rules-ui                                          ClusterIP      10.237.48.49    <none>         8080/TCP                              36d
seldon-webhook-service                            ClusterIP      10.237.53.165   <none>         443/TCP                               36d
templating                                        ClusterIP      10.237.54.124   <none>         5250/TCP                              36d
webapps                                           ClusterIP      10.237.61.72    <none>         8780/TCP                              36d

And a list of endpoints

NAME                                              ENDPOINTS                                                          AGE
admin                                             10.234.1.47:8765                                                   36d
admin-ui                                          10.234.1.55:8080                                                   36d
auth-ui                                           10.234.1.40:8080                                                   36d
connector-plugin-service                          <none>                                                             36d
connectors                                        10.234.0.132:9010                                                  36d
connectors-classic                                                                                                   36d
connectors-rpc                                    10.234.1.29:8771                                                   36d
devops-ui                                         10.234.0.146:8080                                                  36d
fusion-ambassador                                 10.234.0.151:8443,10.234.0.151:8080                                36d
fusion-argo-ui                                    10.234.1.32:2746                                                   36d
fusion-monitoring-grafana                         10.234.0.233:3000                                                  36d
fusion-monitoring-prometheus-kube-state-metrics   10.234.1.42:8081,10.234.1.42:8080                                  36d
fusion-monitoring-prometheus-pushgateway          10.234.0.139:9091                                                  36d
fusion-monitoring-prometheus-server               10.234.0.145:9090                                                  36d
fusion-monitoring-prometheus-server-headless      10.234.0.145:9090                                                  36d
fusion-mysql                                      10.234.1.53:3306                                                   36d
fusion-pulsar-bookkeeper                          10.234.0.140:8000,10.234.0.246:8000,10.234.1.49:8000 + 3 more...   36d
fusion-pulsar-broker                              10.234.0.141:6650,10.234.1.57:6650,10.234.0.141:8080 + 1 more...   36d
fusion-solr-exporter                              10.234.1.44:9983                                                   36d
fusion-solr-headless                              10.234.0.138:8983                                                  36d
fusion-solr-svc                                   10.234.0.138:8983                                                  36d
fusion-zookeeper                                  10.234.0.157:2181,10.234.0.241:2181,10.234.1.43:2181 + 3 more...   36d
fusion-zookeeper-headless                         10.234.0.157:2888,10.234.0.241:2888,10.234.1.43:2888 + 9 more...   36d
indexing                                          10.234.0.149:8765                                                  36d
insights                                          10.234.0.131:8080                                                  36d
job-launcher                                      10.234.0.235:8083                                                  36d
job-rest-server                                   10.234.0.231:8081                                                  36d
milvus                                            10.234.0.227:19530,10.234.0.227:19121                              36d
ml-model-grpc                                     10.234.0.249:6565                                                  36d
ml-model-service                                  10.234.0.249:8086                                                  36d
pm-ui                                             10.234.0.236:8080                                                  36d
proxy                                             10.234.0.230:6764                                                  36d
pulsar-broker                                     10.234.0.141:6650,10.234.1.57:6650,10.234.0.141:8080 + 1 more...   36d
query                                             10.234.1.45:8787                                                   36d
rules-ui                                          10.234.0.234:8080                                                  36d
seldon-webhook-service                            10.234.1.51:443                                                    36d
templating                                        10.234.0.133:5250                                                  36d
webapps                                           10.234.0.251:8780                                                  36d

And description of endpoint fusion-zookeper-headless

Name:         fusion-zookeeper-headless
Namespace:    fusion
Labels:       app=zookeeper
              app.kubernetes.io/managed-by=Helm
              chart=zookeeper-2.4.2
              heritage=Helm
              release=fusion
              service.kubernetes.io/headless=
Annotations:  endpoints.kubernetes.io/last-change-trigger-time: 2021-03-02T14:16:05Z
Subsets:
  Addresses:          10.234.0.157,10.234.0.241,10.234.1.43
  NotReadyAddresses:  <none>
  Ports:
    Name       Port  Protocol
    ----       ----  --------
    server     2888  TCP
    client     2181  TCP
    tlsclient  2281  TCP
    election   3888  TCP

Events:  <none>

and description of service fusion-zookeper-headless

Name:              fusion-zookeeper-headless
Namespace:         fusion
Labels:            app=zookeeper
                   app.kubernetes.io/managed-by=Helm
                   chart=zookeeper-2.4.2
                   heritage=Helm
                   release=fusion
Annotations:       meta.helm.sh/release-name: fusion
                   meta.helm.sh/release-namespace: fusion
Selector:          app=zookeeper,release=fusion
Type:              ClusterIP
IP:                None
Port:              client  2181/TCP
TargetPort:        client/TCP
Endpoints:         10.234.0.157:2181,10.234.0.241:2181,10.234.1.43:2181
Port:              election  3888/TCP
TargetPort:        election/TCP
Endpoints:         10.234.0.157:3888,10.234.0.241:3888,10.234.1.43:3888
Port:              server  2888/TCP
TargetPort:        server/TCP
Endpoints:         10.234.0.157:2888,10.234.0.241:2888,10.234.1.43:2888
Port:              tlsclient  2281/TCP
TargetPort:        tlsclient/TCP
Endpoints:         10.234.0.157:2281,10.234.0.241:2281,10.234.1.43:2281
Session Affinity:  None
Events:            <none>

Can somebody please advice me what is wrong and why the init container of "fusion-classic-rest-service-0" tries to reach fusion-zookeper-headless on such strange IP which differs to IP defined in service fusion-zookeper-headless?

cermakp avatar Apr 08 '21 12:04 cermakp