microservices-demo icon indicating copy to clipboard operation
microservices-demo copied to clipboard

adservice, cartservice and loadgenerator restarting on arm cluster in the loop

Open elinesterov opened this issue 1 year ago • 7 comments

Describe the bug

adservice, cartservice and loadgenerator restarting on kind cluster in the loop

To Reproduce

clone the repo kind create cluster kubectl apply -f -f release/kubernetes-manifests.yaml

kubectl get pods

get pod
NAME                                     READY   STATUS             RESTARTS       AGE
adservice-5464cc8db4-w9lsm               0/1     CrashLoopBackOff   5 (37s ago)    6m41s
cartservice-6458db7c7c-wz4rd             0/1     CrashLoopBackOff   5 (63s ago)    6m41s
checkoutservice-55b497bfb8-wb9hk         1/1     Running            0              6m42s
currencyservice-6f868d85d8-7t4vj         1/1     Running            1 (115s ago)   6m41s
emailservice-5cf5fc5898-h2twr            1/1     Running            0              6m42s
frontend-bfdf66596-gdq6g                 1/1     Running            0              6m42s
loadgenerator-6568b868f-vvwxm            0/1     CrashLoopBackOff   5 (105s ago)   6m41s
paymentservice-5ff68d9c7d-8w2fw          1/1     Running            0              6m42s
productcatalogservice-5b9c9f6488-dgtst   1/1     Running            0              6m42s
recommendationservice-c58857d6-9sq85     1/1     Running            0              6m42s
redis-cart-79b899577-tg5rv               1/1     Running            0              6m41s
shippingservice-6f65f85b8b-6c72r         1/1     Running            0              6m41s

Logs

k logs adservice-5464cc8db4-w9lsm
Could not create logging file: Read-only file system
COULD NOT CREATE A LOGGINGFILE 20230720-052533.1!Could not create logging file: Read-only file system
COULD NOT CREATE A LOGGINGFILE 20230720-052535.1!Could not create logging file: Read-only file system
COULD NOT CREATE A LOGGINGFILE 20230720-052535.1!E0720 05:25:35.280431    36 throttler_api.cc:92] GRPC: src/core/lib/security/credentials/alts/check_gcp_environment.cc:60 BIOS data file cannot be opened.
E0720 05:25:35.572489    36 throttler_api.cc:92] GRPC: src/core/lib/security/credentials/google_default/google_default_credentials.cc:351 Could not create google default credentials: {"created":"@1689830735.278256041","description":"Failed to create Google credentials","file":"src/core/lib/security/credentials/google_default/google_default_credentials.cc","file_line":284,"referenced_errors":[{"created":"@1689830735.279027375","description":"creds_path unset","file":"src/core/lib/security/credentials/google_default/google_default_credentials.cc","file_line":229},{"created":"@1689830735.280197583","description":"Failed to load file","file":"src/core/lib/iomgr/load_file.cc","file_line":71,"filename":"//.config/gcloud/application_default_credentials.json","referenced_errors":[{"created":"@1689830735.280058666","description":"No such file or directory","errno":2,"file":"src/core/lib/iomgr/load_file.cc","file_line":45,"os_error":"No such file or directory","syscall":"fopen"}]}]}
E0720 05:25:35.573143    36 throttler_api.cc:116] Failed to get Google default credentials
E0720 05:25:35.575800    49 native.cc:42] Could not open maps file: /proc/self/maps
E0720 05:25:35.576072    49 throttler_api.cc:297] Profiler API is not initialized, stop profiling
k logs -f cartservice-6458db7c7c-wz4rd
info: Microsoft.Hosting.Lifetime[14]
      Now listening on: http://[::]:7070
info: Microsoft.Hosting.Lifetime[0]
      Application started. Press Ctrl+C to shut down.
info: Microsoft.Hosting.Lifetime[0]
      Hosting environment: Production
info: Microsoft.Hosting.Lifetime[0]
      Content root path: /app
fail: Microsoft.AspNetCore.Server.Kestrel[13]
      Connection id "0HMS8S3AON72S", Request id "0HMS8S3AON72S:00000001": An unhandled exception was thrown by the application.
      Microsoft.AspNetCore.Routing.RouteCreationException: An error occurred while trying to create an instance of 'Grpc.AspNetCore.Server.Model.Internal.GrpcUnimplementedConstraint'.
       ---> System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation.
       ---> System.NullReferenceException: Object reference not set to an instance of an object.
         at InvokeStub_GrpcUnimplementedConstraint..ctor(Object, Object, IntPtr*)
         at System.Reflection.ConstructorInvoker.Invoke(Object, IntPtr*, BindingFlags)
         --- End of inner exception stack trace ---
         at System.Reflection.ConstructorInvoker.Invoke(Object, IntPtr*, BindingFlags)
         at System.Reflection.RuntimeConstructorInfo.Invoke(BindingFlags, Binder, Object[], CultureInfo)
         at System.Reflection.ConstructorInfo.Invoke(Object[] parameters)
         at Microsoft.AspNetCore.Routing.ParameterPolicyActivator.CreateParameterPolicy(IServiceProvider, Type, String)
         at Microsoft.AspNetCore.Routing.ParameterPolicyActivator.ResolveParameterPolicy[T](IDictionary`2, IServiceProvider, String, String& )
         --- End of inner exception stack trace ---
         at Microsoft.AspNetCore.Routing.ParameterPolicyActivator.ResolveParameterPolicy[T](IDictionary`2, IServiceProvider, String, String& )
         at Microsoft.AspNetCore.Routing.DefaultParameterPolicyFactory.Create(RoutePatternParameterPart , String)
         at Microsoft.AspNetCore.Routing.ParameterPolicyFactory.Create(RoutePatternParameterPart , RoutePatternParameterPolicyReference)
         at Microsoft.AspNetCore.Routing.Matching.DfaMatcherBuilder.DfaBuilderWorker.AddParentsWithMatchingLiteralConstraints(List`1, DfaNode, RoutePatternParameterPart, IReadOnlyList`1)
         at Microsoft.AspNetCore.Routing.Matching.DfaMatcherBuilder.DfaBuilderWorker.ProcessSegment(RouteEndpoint, List`1, List`1, RoutePatternPathSegment)
         at Microsoft.AspNetCore.Routing.Matching.DfaMatcherBuilder.DfaBuilderWorker.ProcessLevel(Int32)
         at Microsoft.AspNetCore.Routing.Matching.DfaMatcherBuilder.BuildDfaTree(Boolean )
         at Microsoft.AspNetCore.Routing.Matching.DfaMatcherBuilder.Build()
         at Microsoft.AspNetCore.Routing.Matching.DataSourceDependentMatcher.CreateMatcher(IReadOnlyList`1)
         at Microsoft.AspNetCore.Routing.DataSourceDependentCache`1.Initialize()
         at System.Threading.LazyInitializer.EnsureInitializedCore[T](T& , Boolean&, Object& , Func`1)
         at System.Threading.LazyInitializer.EnsureInitialized[T](T& , Boolean&, Object& , Func`1)
         at Microsoft.AspNetCore.Routing.DataSourceDependentCache`1.EnsureInitialized()
         at Microsoft.AspNetCore.Routing.Matching.DataSourceDependentMatcher..ctor(EndpointDataSource, Lifetime, Func`1)
         at Microsoft.AspNetCore.Routing.Matching.DfaMatcherFactory.CreateMatcher(EndpointDataSource)
         at Microsoft.AspNetCore.Routing.EndpointRoutingMiddleware.InitializeCoreAsync()
      --- End of stack trace from previous location ---
         at Microsoft.AspNetCore.Routing.EndpointRoutingMiddleware.<Invoke>g__AwaitMatcher|8_0(EndpointRoutingMiddleware, HttpContext, Task`1)
         at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ProcessRequests[TContext](IHttpApplication`1)

Screenshots

Environment

Mac OS 13.4.1 (22F82) kind v0.18.0 go1.20.2 darwin/arm64 docker version Client: Cloud integration: v1.0.35 Version: 24.0.2 API version: 1.43 Go version: go1.20.4 Git commit: cb74dfc Built: Thu May 25 21:51:16 2023 OS/Arch: darwin/arm64 Context: desktop-linux

Server: Docker Desktop 4.21.1 (114176) Engine: Version: 24.0.2 API version: 1.43 (minimum version 1.12) Go version: go1.20.4 Git commit: 659604f Built: Thu May 25 21:50:59 2023 OS/Arch: linux/arm64 Experimental: false containerd: Version: 1.6.21 GitCommit: 3dce8eb055cbb6872793272b4f20ed16117344f8 runc: Version: 1.1.7 GitCommit: v1.1.7-0-g860f061 docker-init: Version: 0.19.0 GitCommit: de40ad0

Additional context

Exposure

elinesterov avatar Jul 20 '23 05:07 elinesterov

I dig a bit deeper it seems resource constraints are too tight for the adservice and cartservice. Increasing CPU limit to 800m helps to move the needle but it takes long time before the service becomes reachable (~2minutes) increasing limit to the whole vCPU makes it is faster to start. IDK why it takes so much resources during startup time on kind.

Another important piece of information that I might miss before I'm running all this on apple m2.

elinesterov avatar Jul 20 '23 17:07 elinesterov

ghrr 🤦‍♂️ I just had a time to dig more on it and it seems both adservice and cartservice images built using amd64 images. Of course, they'll be slow and require more resources on arm machine. Can we build multi-arch images?

elinesterov avatar Jul 20 '23 22:07 elinesterov

@elinesterov I am facing the same issue. Did you get any fix for this?

gcp-innovate avatar Jul 26 '23 16:07 gcp-innovate

pod/adservice-76b59c7744-bblh9 0/1 Pending 0 11m pod/cartservice-79ffddbfc9-7mklz 0/1 Pending 0 11m pod/checkoutservice-67b7cc98bd-lt2wp 1/1 Running 0 11m pod/currencyservice-86f65c677b-795r4 1/1 Running 0 11m pod/emailservice-5b9b6b4978-wg67g 1/1 Running 0 11m pod/frontend-758559b46-thtsn 1/1 Running 0 11m pod/loadgenerator-7f5bb4f549-rpgcm 0/1 Pending 0 11m pod/paymentservice-84d58ff866-lllzj 1/1 Running 0 11m pod/productcatalogservice-65bd9bb7dd-rvrzk 1/1 Running 0 11m pod/recommendationservice-5c5f746db6-227j9 1/1 Running 0 11m pod/redis-cart-65f8cb8d5f-jsqwb 1/1 Running 0 11m pod/shippingservice-b7b489f4b-vzd8t 1/1 Running 0 11m

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/adservice ClusterIP 172.16.0.68 9555/TCP 11m service/cartservice ClusterIP 172.16.0.52 7070/TCP 11m service/checkoutservice ClusterIP 172.16.0.54 5050/TCP 11m service/currencyservice ClusterIP 172.16.0.82 7000/TCP 11m service/emailservice ClusterIP 172.16.0.167 5000/TCP 11m service/frontend ClusterIP 172.16.0.161 80/TCP 11m service/frontend-external LoadBalancer 172.16.0.44 34.150.180.89 80:31214/TCP 11m service/kubernetes ClusterIP 172.16.0.1 443/TCP 162d service/paymentservice ClusterIP 172.16.0.175 50051/TCP 11m service/productcatalogservice ClusterIP 172.16.0.109 3550/TCP 11m service/recommendationservice ClusterIP 172.16.0.106 8080/TCP 11m service/redis-cart ClusterIP 172.16.0.168 6379/TCP 11m service/shippingservice ClusterIP 172.16.0.176 50051/TCP 11m

NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/adservice 0/1 1 0 11m deployment.apps/cartservice 0/1 1 0 11m deployment.apps/checkoutservice 1/1 1 1 11m deployment.apps/currencyservice 1/1 1 1 11m deployment.apps/emailservice 1/1 1 1 11m deployment.apps/frontend 1/1 1 1 11m deployment.apps/loadgenerator 0/1 1 0 11m deployment.apps/paymentservice 1/1 1 1 11m deployment.apps/productcatalogservice 1/1 1 1 11m deployment.apps/recommendationservice 1/1 1 1 11m deployment.apps/redis-cart 1/1 1 1 11m deployment.apps/shippingservice 1/1 1 1 11m

NAME DESIRED CURRENT READY AGE replicaset.apps/adservice-76b59c7744 1 1 0 11m replicaset.apps/cartservice-79ffddbfc9 1 1 0 11m replicaset.apps/checkoutservice-67b7cc98bd 1 1 1 11m replicaset.apps/currencyservice-86f65c677b 1 1 1 11m replicaset.apps/emailservice-5b9b6b4978 1 1 1 11m replicaset.apps/frontend-758559b46 1 1 1 11m replicaset.apps/loadgenerator-7f5bb4f549 1 1 0 11m replicaset.apps/paymentservice-84d58ff866 1 1 1 11m replicaset.apps/productcatalogservice-65bd9bb7dd 1 1 1 11m replicaset.apps/recommendationservice-5c5f746db6 1 1 1 11m replicaset.apps/redis-cart-65f8cb8d5f 1 1 1 11m replicaset.apps/shippingservice-b7b489f4b 1 1 1 11m

It says below message "Reason Cannot schedule pods: No preemption victims found for incoming pod."

kind: "Event" message: "0/3 nodes are available: 3 Insufficient cpu. preemption: 0/3 nodes are available: 3 No preemption victims found for incoming pod."

Which sound strange as GKE nodes have good amount of CPUs on it

gcp-innovate avatar Jul 26 '23 16:07 gcp-innovate

hi, my apologizes for the delayed response. We are currently do not consider kind clusters as possible deployment destination and do not validate the demo application on the kind clusters. The resource constraints that workload manifest define aim to ensure correct operation of the demo and can come in conflict with the available resources of the kind clusters. Note that you can remove load generator from deploying if you are using kustomize configurations. If you run your kind cluster on Mac, you might want to build images for Mac architecture to reduce overhead required to run cross-architecture containers. Unfortunately, the project does not support it ootb (#1448).

minherz avatar Jul 26 '23 22:07 minherz

Fixed by changing node machine type thanks

gcp-innovate avatar Jul 27 '23 06:07 gcp-innovate

@minherz

my apologizes for the delayed response. We are currently do not consider kind clusters as possible deployment destination and do not validate the demo application on the kind clusters. As you see in my later comments, the problem is not in the kind cluster but rather in the arch of images. Updating limits would help a bit but there is definitely an issue of running amd64 arch image with c# application on arm64 machine.

If you run your kind cluster on Mac, you might want to build images for Mac architecture to reduce overhead required to run cross-architecture containers.

You cannot do this because, in your docker files, you pin to amd64 arch images only.

For instance: FROM eclipse-temurin:19.0.1_10-jre-alpine@sha256:a75ea64f676041562cd7d3a54a9764bbfb357b2bf1bebf46e2af73e62d32e36c

is clearly amd64 only image.

You can use eclipse-temurin:19 or eclipse-temurin:19.0.1_10-jre (non-alpine) to build matriarch images.

elinesterov avatar Jul 29 '23 16:07 elinesterov

Closing as a duplicate of https://github.com/GoogleCloudPlatform/microservices-demo/issues/622

bourgeoisor avatar Apr 17 '24 22:04 bourgeoisor