microservices-demo icon indicating copy to clipboard operation
microservices-demo copied to clipboard

adservice in "CrashLoopBackOff" because "java.lang.AbstractMethodError"

Open yogeek opened this issue 3 years ago • 2 comments
trafficstars

Describe the bug

We deployed this project to our K8S cluster a while ago, and all was fine but today we had an alert because one of the adservice replicas is in CrashLoopBackOff FYI, we added HPAs for each microservice and currently adservice has 3 replicas but only 1 replica is failing.

Logs

$ k get po -l app=adservice
NAME                         READY   STATUS             RESTARTS   AGE
adservice-6c7b756c6d-z88xz   2/2     Running            0          13m
adservice-6c7b756c6d-zs8zp   2/2     Running            0          55d
adservice-85f97644f8-wr2k9   1/2     CrashLoopBackOff   7          15m
Details
E0427 09:23:59.991662    18 throttler_api.cc:85] GRPC: src/core/lib/security/credentials/google_default/google_default_credentials.cc:351 Could not create google default credentials: {"created":"@1651051439.989309694","description":"Failed to create Google credentials","file":"src/core/lib/security/credentials/google_default/google_default_credentials.cc","file_line":285,"referenced_errors":[{"created":"@1651051439.989313928","description":"creds_path unset","file":"src/core/lib/security/credentials/google_default/google_default_credentials.cc","file_line":229},{"created":"@1651051439.989409998","description":"Failed to load file","file":"src/core/lib/iomgr/load_file.cc","file_line":72,"filename":"/root/.config/gcloud/application_default_credentials.json","referenced_errors":[{"created":"@1651051439.989408857","description":"No such file or directory","errno":2,"file":"src/core/lib/iomgr/load_file.cc","file_line":45,"os_error":"No such file or directory","syscall":"fopen"}]}]}
E0427 09:23:59.992007    18 throttler_api.cc:109] Failed to get Google default credentials
E0427 09:23:59.993214    26 throttler_api.cc:290] Profiler API is not initialized, stop profiling
{"instant":{"epochSecond":1651051449,"nanoOfSecond":587000000},"thread":"main","level":"INFO","loggerName":"hipstershop.AdService","message":"Jaeger initialization disabled.","endOfBatch":false,"loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","threadId":1,"threadPriority":5,"logging.googleapis.com/trace":"00000000000000000000000000000000","logging.googleapis.com/spanId":"0000000000000000","logging.googleapis.com/traceSampled":"false","time":"2022-04-27T09:24:09.587Z"}
{"instant":{"epochSecond":1651051451,"nanoOfSecond":468000000},"thread":"main","level":"INFO","loggerName":"hipstershop.AdService","message":"AdService starting.","endOfBatch":false,"loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","threadId":1,"threadPriority":5,"logging.googleapis.com/trace":"00000000000000000000000000000000","logging.googleapis.com/spanId":"0000000000000000","logging.googleapis.com/traceSampled":"false","time":"2022-04-27T09:24:11.468Z"}
{"instant":{"epochSecond":1651051449,"nanoOfSecond":587000000},"thread":"Thread-4","level":"INFO","loggerName":"hipstershop.AdService","message":"Stats enabled","endOfBatch":false,"loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","threadId":14,"threadPriority":5,"logging.googleapis.com/trace":"00000000000000000000000000000000","logging.googleapis.com/spanId":"0000000000000000","logging.googleapis.com/traceSampled":"false","time":"2022-04-27T09:24:09.587Z"}
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Exception in thread "main" java.lang.AbstractMethodError: io.grpc.internal.AbstractServerImplBuilder.delegate()Lio/grpc/ServerBuilder;
    at io.grpc.internal.AbstractServerImplBuilder.addService(AbstractServerImplBuilder.java:93)
    at io.grpc.internal.AbstractServerImplBuilder.addService(AbstractServerImplBuilder.java:48)
    at hipstershop.AdService.start(AdService.java:72)
    at hipstershop.AdService.main(AdService.java:333)
Apr 27, 2022 9:24:14 AM io.opencensus.implcore.trace.RecordEventsSpanImpl finalize
SEVERE: Span Sent.com.google.api.client.http.HttpRequest.execute is GC'ed without being ended.
{"instant":{"epochSecond":1651051454,"nanoOfSecond":663000000},"thread":"Thread-4","level":"INFO","loggerName":"hipstershop.AdService","message":"Attempt to register Stackdriver Exporter in 10 seconds ","endOfBatch":false,"loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","threadId":14,"threadPriority":5,"logging.googleapis.com/trace":"00000000000000000000000000000000","logging.googleapis.com/spanId":"0000000000000000","logging.googleapis.com/traceSampled":"false","time":"2022-04-27T09:24:14.663Z"}
{"instant":{"epochSecond":1651051464,"nanoOfSecond":668000000},"thread":"Thread-4","level":"INFO","loggerName":"hipstershop.AdService","message":"Attempt to register Stackdriver Exporter in 10 seconds ","endOfBatch":false,"loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","threadId":14,"threadPriority":5,"logging.googleapis.com/trace":"00000000000000000000000000000000","logging.googleapis.com/spanId":"0000000000000000","logging.googleapis.com/traceSampled":"false","time":"2022-04-27T09:24:24.668Z"}

Screenshots

Environment

Kubernetes 1.21.8 deployed with kubeadm on AWS EC2

Project version = v0.3.6

yogeek avatar Apr 27 '22 09:04 yogeek

Hi @yogeek,

Thanks for reporting this issue! :)

Questions

I have some questions:

  1. In your logs, I see 2/2 under the READY column of each adservice Pod. Are you deploying an additional container next to each adservice container (e.g., a sidecar proxy container)?
  2. Do you have logs from the Pod with the CrashLoopBackOff status?
  3. Could you please do a kubectl describe pod your_pod_name on the crashing Pod and share the output?
  4. How many Nodes does your Kubernetes cluster have?
  5. How much CPU, memory, and disk (storage) does each Node have? I recommend a cluster with total of 4 CPUs and 16 GB of memory/RAM for Online Boutique. I just tried running on a (GKE) cluster with 2 CPUs and 8GB RAM, and received a CrashLoopBackOff error in the cartservice-c77f6b866-q2qrc.

Failed to Reproduce

I tried reproducing the issue on my cluster (GKE with 2 e2-standard-2 Nodes), with 6 replicas of adservice, using version 0.3.6 of Online Boutique:

...
kind: Deployment
metadata:
  name: adservice
spec:
  replicas: 6
...

All Pods were running fine for me:

NAME                                     READY   STATUS    RESTARTS   AGE
adservice-85598d856b-5x2gl               1/1     Running   0          4m1s
adservice-85598d856b-5z99m               1/1     Running   0          7m
adservice-85598d856b-9nv92               1/1     Running   0          4m1s
adservice-85598d856b-mnd5g               1/1     Running   0          4m1s
adservice-85598d856b-vwpxf               1/1     Running   0          4m1s
adservice-85598d856b-xhfkh               1/1     Running   0          4m1s
cartservice-c77f6b866-mtndr              1/1     Running   0          7m2s
checkoutservice-654c47f4b6-4wjtd         1/1     Running   0          7m4s
currencyservice-59bc889674-j5kvr         1/1     Running   0          7m1s
emailservice-5b9fff7cb8-qz4lz            1/1     Running   0          7m4s
frontend-77b88cc7cb-klv8x                1/1     Running   0          7m3s
loadgenerator-6958f5bc8b-c95xz           1/1     Running   0          7m1s
paymentservice-68dd9755bb-7rb8h          1/1     Running   0          7m3s
productcatalogservice-557ff44b96-whcm4   1/1     Running   0          7m2s
recommendationservice-64dc9dfbc8-6xbcn   1/1     Running   0          7m4s
redis-cart-5b569cd47-mzwht               1/1     Running   0          7m
shippingservice-5488d5b6cb-8tzm5         1/1     Running   0          7m1s

And my manual smoke test of the frontend worked fine.

CC: @ZeBidule, @dchaumond

@Shabirmean, since v0.3.6 of Online Boutique works fine on GKE with 2 e2-standard-2 Nodes (our Quickstart uses 4) and since the CrashLoopBackOff error only occurred (for @yogeek) on one of the three Pods (i.e., the Online Boutique is still functional), I will bring the priority down to p2. Let me know if this is fine.

NimJay avatar May 05 '22 22:05 NimJay

@NimJay thanks for the help

  1. we use istio in this namespace indeed
  2. the log from the pods in CrashLoopBackOff are in my first message, in a collapsed section called "Details"
  3. kubectl describe pod output :
k describe po
$ k describe po adservice-785d644577-6qcpb
Name:         adservice-785d644577-6qcpb
Namespace:    gtp-microdemo
Priority:     0
Node:         ip-10-10-103-222.eu-central-1.compute.internal/10.10.103.222
Start Time:   Mon, 25 Apr 2022 16:56:07 +0200
Labels:       app=adservice
              istio.io/rev=1-10-4
              pod-template-hash=785d644577
              security.istio.io/tlsMode=istio
              service.istio.io/canonical-name=adservice
              service.istio.io/canonical-revision=latest
Annotations:  cni.projectcalico.org/containerID: 77bf5982441f14f9c343e5473c6460a7d9627df22cb843b11268b7379d27cf08
              cni.projectcalico.org/podIP: 192.168.246.164/32
              cni.projectcalico.org/podIPs: 192.168.246.164/32
              kubectl.kubernetes.io/default-container: server
              kubectl.kubernetes.io/default-logs-container: server
              sidecar.istio.io/status:
                {"initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["istio-envoy","istio-data","istio-podinfo","istio-token","istiod-...
Status:       Running
IP:           192.168.246.164
IPs:
  IP:           192.168.246.164
Controlled By:  ReplicaSet/adservice-785d644577
Init Containers:
  istio-init:
    Container ID:  docker://e635c4f7b9870a035f0c248dd43ff2c27b37332b94f44c4f4fe5fb0b0bb0e921
    Image:         082461646758.dkr.ecr.eu-central-1.amazonaws.com/istio/proxyv2:1.10.4
    Image ID:      docker-pullable://082461646758.dkr.ecr.eu-central-1.amazonaws.com/istio/proxyv2@sha256:988c76a4fd4d4bb8467fe21d629bd3f353895ceedb9a05893eeb63c2fa5c7301
    Port:          <none>
    Host Port:     <none>
    Args:
      istio-iptables
      -p
      15001
      -z
      15006
      -u
      1337
      -m
      REDIRECT
      -i
      *
      -x
      169.254.169.254/32
      -b
      *
      -d
      15090,15021,15020
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 25 Apr 2022 16:56:11 +0200
      Finished:     Mon, 25 Apr 2022 16:56:11 +0200
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     2
      memory:  1Gi
    Requests:
      cpu:        10m
      memory:     40Mi
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8t4pd (ro)
Containers:
  server:
    Container ID:   docker://4ab031a0020f7179648c2b82660df603f2214d2ef6af87b808378ea9899c2073
    Image:          082461646758.dkr.ecr.eu-central-1.amazonaws.com/gtp/google-samples/microservices-demo/adservice:v0.3.6
    Image ID:       docker-pullable://082461646758.dkr.ecr.eu-central-1.amazonaws.com/gtp/google-samples/microservices-demo/adservice@sha256:93a9850760505f9695a5127a57477107a86fa2699a68e49a8d26b20f0601e07f
    Port:           9555/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    143
      Started:      Fri, 06 May 2022 10:51:23 +0200
      Finished:     Fri, 06 May 2022 10:52:23 +0200
    Ready:          False
    Restart Count:  4393
    Limits:
      cpu:     300m
      memory:  300Mi
    Requests:
      cpu:      200m
      memory:   180Mi
    Liveness:   exec [/bin/grpc_health_probe -addr=:9555] delay=20s timeout=1s period=15s #success=1 #failure=3
    Readiness:  exec [/bin/grpc_health_probe -addr=:9555] delay=20s timeout=1s period=15s #success=1 #failure=3
    Environment:
      DISABLE_PROFILER:  1
      DISABLE_DEBUGGER:  1
      PORT:              9555
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8t4pd (ro)
  istio-proxy:
    Container ID:  docker://e354300d665d2729c2df8182996365471983e01b8ef9aa0126293a093b956861
    Image:         082461646758.dkr.ecr.eu-central-1.amazonaws.com/istio/proxyv2:1.10.4
    Image ID:      docker-pullable://082461646758.dkr.ecr.eu-central-1.amazonaws.com/istio/proxyv2@sha256:988c76a4fd4d4bb8467fe21d629bd3f353895ceedb9a05893eeb63c2fa5c7301
    Port:          15090/TCP
    Host Port:     0/TCP
    Args:
      proxy
      sidecar
      --domain
      $(POD_NAMESPACE).svc.cluster.local
      --serviceCluster
      adservice.$(POD_NAMESPACE)
      --proxyLogLevel=warning
      --proxyComponentLogLevel=misc:error
      --log_output_level=default:info
      --concurrency
      2
    State:          Running
      Started:      Mon, 25 Apr 2022 16:56:33 +0200
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     2
      memory:  1Gi
    Requests:
      cpu:      10m
      memory:   40Mi
    Readiness:  http-get http://:15021/healthz/ready delay=1s timeout=3s period=2s #success=1 #failure=30
    Environment:
      JWT_POLICY:                    third-party-jwt
      PILOT_CERT_PROVIDER:           istiod
      CA_ADDR:                       istiod-1-10-4.istio-system.svc:15012
      POD_NAME:                      adservice-785d644577-6qcpb (v1:metadata.name)
      POD_NAMESPACE:                 gtp-microdemo (v1:metadata.namespace)
      INSTANCE_IP:                    (v1:status.podIP)
      SERVICE_ACCOUNT:                (v1:spec.serviceAccountName)
      HOST_IP:                        (v1:status.hostIP)
      CANONICAL_SERVICE:              (v1:metadata.labels['service.istio.io/canonical-name'])
      CANONICAL_REVISION:             (v1:metadata.labels['service.istio.io/canonical-revision'])
      PROXY_CONFIG:                  {"discoveryAddress":"istiod-1-10-4.istio-system.svc:15012","tracing":{"zipkin":{"address":"jaeger-collector.observability:9411"}}}
                                     
      ISTIO_META_POD_PORTS:          [
                                         {"containerPort":9555,"protocol":"TCP"}
                                     ]
      ISTIO_META_APP_CONTAINERS:     server
      ISTIO_META_CLUSTER_ID:         Kubernetes
      ISTIO_META_INTERCEPTION_MODE:  REDIRECT
      ISTIO_META_WORKLOAD_NAME:      adservice
      ISTIO_META_OWNER:              kubernetes://apis/apps/v1/namespaces/gtp-microdemo/deployments/adservice
      ISTIO_META_MESH_ID:            cluster.local
      TRUST_DOMAIN:                  cluster.local
    Mounts:
      /etc/istio/pod from istio-podinfo (rw)
      /etc/istio/proxy from istio-envoy (rw)
      /var/lib/istio/data from istio-data (rw)
      /var/run/secrets/istio from istiod-ca-cert (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8t4pd (ro)
      /var/run/secrets/tokens from istio-token (rw)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  istio-envoy:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
  istio-data:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  istio-podinfo:
    Type:  DownwardAPI (a volume populated by information about the pod)
    Items:
      metadata.labels -> labels
      metadata.annotations -> annotations
      limits.cpu -> cpu-limit
      requests.cpu -> cpu-request
  istio-token:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  43200
  istiod-ca-cert:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      istio-ca-root-cert
    Optional:  false
  kube-api-access-8t4pd:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                      From     Message
  ----     ------     ----                     ----     -------
  Warning  Unhealthy  38m (x12678 over 10d)    kubelet  Liveness probe failed:
  Warning  BackOff    8m18s (x53679 over 10d)  kubelet  Back-off restarting failed container
  Warning  Unhealthy  3m8s (x12695 over 10d)   kubelet  Readiness probe failed:

  1. our clusters are multi-tenant (this application is just to provide a "testing" app for us to check if the cluster is working ok) and we have cluster-autoscaler so depending on the workload, it can vary from 10 nodes to 50 approximatively
  2. Our nodes are t3.xlarge EC2 instances so 4 CPUs / 16GB

yogeek avatar May 06 '22 09:05 yogeek

With the recent changes on the adservice (base image, gradle, dependencies, etc.) it would be ideal to retest this with the coming version 0.3.10 of Online Boutique.

mathieu-benoit avatar Sep 27 '22 13:09 mathieu-benoit

Closing this issue as we are not able to reproduce it and it would be great to get an update with a recent version of Online Boutique/adservice. But please feel free to re-open it if you still have the errors with the coming version of adservice 0.3.10.

mathieu-benoit avatar Oct 03 '22 13:10 mathieu-benoit