microservices-demo
microservices-demo copied to clipboard
adservice in "CrashLoopBackOff" because "java.lang.AbstractMethodError"
Describe the bug
We deployed this project to our K8S cluster a while ago, and all was fine but today we had an alert because one of the adservice replicas is in CrashLoopBackOff
FYI, we added HPAs for each microservice and currently adservice has 3 replicas but only 1 replica is failing.
Logs
$ k get po -l app=adservice
NAME READY STATUS RESTARTS AGE
adservice-6c7b756c6d-z88xz 2/2 Running 0 13m
adservice-6c7b756c6d-zs8zp 2/2 Running 0 55d
adservice-85f97644f8-wr2k9 1/2 CrashLoopBackOff 7 15m
Details
E0427 09:23:59.991662 18 throttler_api.cc:85] GRPC: src/core/lib/security/credentials/google_default/google_default_credentials.cc:351 Could not create google default credentials: {"created":"@1651051439.989309694","description":"Failed to create Google credentials","file":"src/core/lib/security/credentials/google_default/google_default_credentials.cc","file_line":285,"referenced_errors":[{"created":"@1651051439.989313928","description":"creds_path unset","file":"src/core/lib/security/credentials/google_default/google_default_credentials.cc","file_line":229},{"created":"@1651051439.989409998","description":"Failed to load file","file":"src/core/lib/iomgr/load_file.cc","file_line":72,"filename":"/root/.config/gcloud/application_default_credentials.json","referenced_errors":[{"created":"@1651051439.989408857","description":"No such file or directory","errno":2,"file":"src/core/lib/iomgr/load_file.cc","file_line":45,"os_error":"No such file or directory","syscall":"fopen"}]}]}
E0427 09:23:59.992007 18 throttler_api.cc:109] Failed to get Google default credentials
E0427 09:23:59.993214 26 throttler_api.cc:290] Profiler API is not initialized, stop profiling
{"instant":{"epochSecond":1651051449,"nanoOfSecond":587000000},"thread":"main","level":"INFO","loggerName":"hipstershop.AdService","message":"Jaeger initialization disabled.","endOfBatch":false,"loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","threadId":1,"threadPriority":5,"logging.googleapis.com/trace":"00000000000000000000000000000000","logging.googleapis.com/spanId":"0000000000000000","logging.googleapis.com/traceSampled":"false","time":"2022-04-27T09:24:09.587Z"}
{"instant":{"epochSecond":1651051451,"nanoOfSecond":468000000},"thread":"main","level":"INFO","loggerName":"hipstershop.AdService","message":"AdService starting.","endOfBatch":false,"loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","threadId":1,"threadPriority":5,"logging.googleapis.com/trace":"00000000000000000000000000000000","logging.googleapis.com/spanId":"0000000000000000","logging.googleapis.com/traceSampled":"false","time":"2022-04-27T09:24:11.468Z"}
{"instant":{"epochSecond":1651051449,"nanoOfSecond":587000000},"thread":"Thread-4","level":"INFO","loggerName":"hipstershop.AdService","message":"Stats enabled","endOfBatch":false,"loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","threadId":14,"threadPriority":5,"logging.googleapis.com/trace":"00000000000000000000000000000000","logging.googleapis.com/spanId":"0000000000000000","logging.googleapis.com/traceSampled":"false","time":"2022-04-27T09:24:09.587Z"}
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Exception in thread "main" java.lang.AbstractMethodError: io.grpc.internal.AbstractServerImplBuilder.delegate()Lio/grpc/ServerBuilder;
at io.grpc.internal.AbstractServerImplBuilder.addService(AbstractServerImplBuilder.java:93)
at io.grpc.internal.AbstractServerImplBuilder.addService(AbstractServerImplBuilder.java:48)
at hipstershop.AdService.start(AdService.java:72)
at hipstershop.AdService.main(AdService.java:333)
Apr 27, 2022 9:24:14 AM io.opencensus.implcore.trace.RecordEventsSpanImpl finalize
SEVERE: Span Sent.com.google.api.client.http.HttpRequest.execute is GC'ed without being ended.
{"instant":{"epochSecond":1651051454,"nanoOfSecond":663000000},"thread":"Thread-4","level":"INFO","loggerName":"hipstershop.AdService","message":"Attempt to register Stackdriver Exporter in 10 seconds ","endOfBatch":false,"loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","threadId":14,"threadPriority":5,"logging.googleapis.com/trace":"00000000000000000000000000000000","logging.googleapis.com/spanId":"0000000000000000","logging.googleapis.com/traceSampled":"false","time":"2022-04-27T09:24:14.663Z"}
{"instant":{"epochSecond":1651051464,"nanoOfSecond":668000000},"thread":"Thread-4","level":"INFO","loggerName":"hipstershop.AdService","message":"Attempt to register Stackdriver Exporter in 10 seconds ","endOfBatch":false,"loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","threadId":14,"threadPriority":5,"logging.googleapis.com/trace":"00000000000000000000000000000000","logging.googleapis.com/spanId":"0000000000000000","logging.googleapis.com/traceSampled":"false","time":"2022-04-27T09:24:24.668Z"}
Screenshots
Environment
Kubernetes 1.21.8 deployed with kubeadm on AWS EC2
Project version = v0.3.6
Hi @yogeek,
Thanks for reporting this issue! :)
Questions
I have some questions:
- In your logs, I see
2/2under theREADYcolumn of eachadservicePod. Are you deploying an additional container next to eachadservicecontainer (e.g., a sidecar proxy container)? - Do you have logs from the Pod with the
CrashLoopBackOffstatus? - Could you please do a
kubectl describe pod your_pod_nameon the crashing Pod and share the output? - How many Nodes does your Kubernetes cluster have?
- How much CPU, memory, and disk (storage) does each Node have? I recommend a cluster with total of 4 CPUs and 16 GB of memory/RAM for Online Boutique. I just tried running on a (GKE) cluster with 2 CPUs and 8GB RAM, and received a
CrashLoopBackOfferror in thecartservice-c77f6b866-q2qrc.
Failed to Reproduce
I tried reproducing the issue on my cluster (GKE with 2 e2-standard-2 Nodes), with 6 replicas of adservice, using version 0.3.6 of Online Boutique:
...
kind: Deployment
metadata:
name: adservice
spec:
replicas: 6
...
All Pods were running fine for me:
NAME READY STATUS RESTARTS AGE
adservice-85598d856b-5x2gl 1/1 Running 0 4m1s
adservice-85598d856b-5z99m 1/1 Running 0 7m
adservice-85598d856b-9nv92 1/1 Running 0 4m1s
adservice-85598d856b-mnd5g 1/1 Running 0 4m1s
adservice-85598d856b-vwpxf 1/1 Running 0 4m1s
adservice-85598d856b-xhfkh 1/1 Running 0 4m1s
cartservice-c77f6b866-mtndr 1/1 Running 0 7m2s
checkoutservice-654c47f4b6-4wjtd 1/1 Running 0 7m4s
currencyservice-59bc889674-j5kvr 1/1 Running 0 7m1s
emailservice-5b9fff7cb8-qz4lz 1/1 Running 0 7m4s
frontend-77b88cc7cb-klv8x 1/1 Running 0 7m3s
loadgenerator-6958f5bc8b-c95xz 1/1 Running 0 7m1s
paymentservice-68dd9755bb-7rb8h 1/1 Running 0 7m3s
productcatalogservice-557ff44b96-whcm4 1/1 Running 0 7m2s
recommendationservice-64dc9dfbc8-6xbcn 1/1 Running 0 7m4s
redis-cart-5b569cd47-mzwht 1/1 Running 0 7m
shippingservice-5488d5b6cb-8tzm5 1/1 Running 0 7m1s
And my manual smoke test of the frontend worked fine.
CC: @ZeBidule, @dchaumond
@Shabirmean, since v0.3.6 of Online Boutique works fine on GKE with 2 e2-standard-2 Nodes (our Quickstart uses 4) and since the CrashLoopBackOff error only occurred (for @yogeek) on one of the three Pods (i.e., the Online Boutique is still functional), I will bring the priority down to p2. Let me know if this is fine.
@NimJay thanks for the help
- we use istio in this namespace indeed
- the log from the pods in
CrashLoopBackOffare in my first message, in a collapsed section called "Details" - kubectl describe pod output :
k describe po
$ k describe po adservice-785d644577-6qcpb
Name: adservice-785d644577-6qcpb
Namespace: gtp-microdemo
Priority: 0
Node: ip-10-10-103-222.eu-central-1.compute.internal/10.10.103.222
Start Time: Mon, 25 Apr 2022 16:56:07 +0200
Labels: app=adservice
istio.io/rev=1-10-4
pod-template-hash=785d644577
security.istio.io/tlsMode=istio
service.istio.io/canonical-name=adservice
service.istio.io/canonical-revision=latest
Annotations: cni.projectcalico.org/containerID: 77bf5982441f14f9c343e5473c6460a7d9627df22cb843b11268b7379d27cf08
cni.projectcalico.org/podIP: 192.168.246.164/32
cni.projectcalico.org/podIPs: 192.168.246.164/32
kubectl.kubernetes.io/default-container: server
kubectl.kubernetes.io/default-logs-container: server
sidecar.istio.io/status:
{"initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["istio-envoy","istio-data","istio-podinfo","istio-token","istiod-...
Status: Running
IP: 192.168.246.164
IPs:
IP: 192.168.246.164
Controlled By: ReplicaSet/adservice-785d644577
Init Containers:
istio-init:
Container ID: docker://e635c4f7b9870a035f0c248dd43ff2c27b37332b94f44c4f4fe5fb0b0bb0e921
Image: 082461646758.dkr.ecr.eu-central-1.amazonaws.com/istio/proxyv2:1.10.4
Image ID: docker-pullable://082461646758.dkr.ecr.eu-central-1.amazonaws.com/istio/proxyv2@sha256:988c76a4fd4d4bb8467fe21d629bd3f353895ceedb9a05893eeb63c2fa5c7301
Port: <none>
Host Port: <none>
Args:
istio-iptables
-p
15001
-z
15006
-u
1337
-m
REDIRECT
-i
*
-x
169.254.169.254/32
-b
*
-d
15090,15021,15020
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 25 Apr 2022 16:56:11 +0200
Finished: Mon, 25 Apr 2022 16:56:11 +0200
Ready: True
Restart Count: 0
Limits:
cpu: 2
memory: 1Gi
Requests:
cpu: 10m
memory: 40Mi
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8t4pd (ro)
Containers:
server:
Container ID: docker://4ab031a0020f7179648c2b82660df603f2214d2ef6af87b808378ea9899c2073
Image: 082461646758.dkr.ecr.eu-central-1.amazonaws.com/gtp/google-samples/microservices-demo/adservice:v0.3.6
Image ID: docker-pullable://082461646758.dkr.ecr.eu-central-1.amazonaws.com/gtp/google-samples/microservices-demo/adservice@sha256:93a9850760505f9695a5127a57477107a86fa2699a68e49a8d26b20f0601e07f
Port: 9555/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 143
Started: Fri, 06 May 2022 10:51:23 +0200
Finished: Fri, 06 May 2022 10:52:23 +0200
Ready: False
Restart Count: 4393
Limits:
cpu: 300m
memory: 300Mi
Requests:
cpu: 200m
memory: 180Mi
Liveness: exec [/bin/grpc_health_probe -addr=:9555] delay=20s timeout=1s period=15s #success=1 #failure=3
Readiness: exec [/bin/grpc_health_probe -addr=:9555] delay=20s timeout=1s period=15s #success=1 #failure=3
Environment:
DISABLE_PROFILER: 1
DISABLE_DEBUGGER: 1
PORT: 9555
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8t4pd (ro)
istio-proxy:
Container ID: docker://e354300d665d2729c2df8182996365471983e01b8ef9aa0126293a093b956861
Image: 082461646758.dkr.ecr.eu-central-1.amazonaws.com/istio/proxyv2:1.10.4
Image ID: docker-pullable://082461646758.dkr.ecr.eu-central-1.amazonaws.com/istio/proxyv2@sha256:988c76a4fd4d4bb8467fe21d629bd3f353895ceedb9a05893eeb63c2fa5c7301
Port: 15090/TCP
Host Port: 0/TCP
Args:
proxy
sidecar
--domain
$(POD_NAMESPACE).svc.cluster.local
--serviceCluster
adservice.$(POD_NAMESPACE)
--proxyLogLevel=warning
--proxyComponentLogLevel=misc:error
--log_output_level=default:info
--concurrency
2
State: Running
Started: Mon, 25 Apr 2022 16:56:33 +0200
Ready: True
Restart Count: 0
Limits:
cpu: 2
memory: 1Gi
Requests:
cpu: 10m
memory: 40Mi
Readiness: http-get http://:15021/healthz/ready delay=1s timeout=3s period=2s #success=1 #failure=30
Environment:
JWT_POLICY: third-party-jwt
PILOT_CERT_PROVIDER: istiod
CA_ADDR: istiod-1-10-4.istio-system.svc:15012
POD_NAME: adservice-785d644577-6qcpb (v1:metadata.name)
POD_NAMESPACE: gtp-microdemo (v1:metadata.namespace)
INSTANCE_IP: (v1:status.podIP)
SERVICE_ACCOUNT: (v1:spec.serviceAccountName)
HOST_IP: (v1:status.hostIP)
CANONICAL_SERVICE: (v1:metadata.labels['service.istio.io/canonical-name'])
CANONICAL_REVISION: (v1:metadata.labels['service.istio.io/canonical-revision'])
PROXY_CONFIG: {"discoveryAddress":"istiod-1-10-4.istio-system.svc:15012","tracing":{"zipkin":{"address":"jaeger-collector.observability:9411"}}}
ISTIO_META_POD_PORTS: [
{"containerPort":9555,"protocol":"TCP"}
]
ISTIO_META_APP_CONTAINERS: server
ISTIO_META_CLUSTER_ID: Kubernetes
ISTIO_META_INTERCEPTION_MODE: REDIRECT
ISTIO_META_WORKLOAD_NAME: adservice
ISTIO_META_OWNER: kubernetes://apis/apps/v1/namespaces/gtp-microdemo/deployments/adservice
ISTIO_META_MESH_ID: cluster.local
TRUST_DOMAIN: cluster.local
Mounts:
/etc/istio/pod from istio-podinfo (rw)
/etc/istio/proxy from istio-envoy (rw)
/var/lib/istio/data from istio-data (rw)
/var/run/secrets/istio from istiod-ca-cert (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8t4pd (ro)
/var/run/secrets/tokens from istio-token (rw)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
istio-envoy:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: <unset>
istio-data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
istio-podinfo:
Type: DownwardAPI (a volume populated by information about the pod)
Items:
metadata.labels -> labels
metadata.annotations -> annotations
limits.cpu -> cpu-limit
requests.cpu -> cpu-request
istio-token:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 43200
istiod-ca-cert:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: istio-ca-root-cert
Optional: false
kube-api-access-8t4pd:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 38m (x12678 over 10d) kubelet Liveness probe failed:
Warning BackOff 8m18s (x53679 over 10d) kubelet Back-off restarting failed container
Warning Unhealthy 3m8s (x12695 over 10d) kubelet Readiness probe failed:
- our clusters are multi-tenant (this application is just to provide a "testing" app for us to check if the cluster is working ok) and we have cluster-autoscaler so depending on the workload, it can vary from 10 nodes to 50 approximatively
- Our nodes are
t3.xlargeEC2 instances so 4 CPUs / 16GB
With the recent changes on the adservice (base image, gradle, dependencies, etc.) it would be ideal to retest this with the coming version 0.3.10 of Online Boutique.
Closing this issue as we are not able to reproduce it and it would be great to get an update with a recent version of Online Boutique/adservice. But please feel free to re-open it if you still have the errors with the coming version of adservice 0.3.10.