microservices-demo Load testing using Locust failing for Arm64 Platform

Hi @NimJay

Describe the bug

I tested the Hipster Shop performance on AWS, GCP and Azure instances using Locust on a minikube cluster. Earlier I built the Docker images for Arm64 with the grpc-health-probe binary used in the Dockerfile and tested the same by deploying the Hipster Shop or performing testing using Locust. For Arm64 I can test the load up to 6500 users without any failures but for x86_64 I was getting failures from 2000 users.

Now the grpc-health-probe binary is no longer necessary because in Kubernetes-1.24 the gRPC health-check probes functionality is built into Kubernetes. I've rebuilt the Arm64 Docker images without the grpc-health-probe binary and updated the kubernetes-manifests.yaml with the images I built. Deployment works successfully, and the UI is accessible. However, I'm getting failures during load testing using Locust due to port forwarding failing to handle requests from 2000 users on both Arm64 and x86_64 architectures.

Due to the latest changes, I am blocked. Could you please share some pointers for the same?

To Reproduce

Steps to reproduce the behavior: Tested the locust load by taking a master node and 11 worker nodes to generate the load perfectly by deploying the app on a minikube cluster.

Ran command locust -f locustfile.py --master and locust -f locustfile.py --worker --master-host=localhost

Logs

load test result at 2000 user.pdf

Screenshots

During load testing port forwarding failing to handle requests from 2000 or more users

Environment

OS: Ubuntu 22.04.2 LTS
Kubernetes distribution, version: minikube
Any relevant tool version: Locust 2.15.1

Jun 28 '23 08:06 odidev

Hi @odidev, Thanks for reporting this issue — your description is very clear and thorough. :)

Are you suggesting that the removal of grpc-health-probe is impacting scalability of Online Boutique? I doubt that the root cause is the removal of grpc-health-probe.
How many instances/replicas of each Pod are you running?
What error message are you seeing in the logs of the various microservices? Are you, for instance, able to run kubectl logs --selector app=frontend or kubectl logs --selector app=cartservice?
This demo app (Online Boutique) doesn't make any promises about how many users it supports, so I will lower the priority on this. Hope that's okay.

Jul 04 '23 15:07 NimJay

Are you suggesting that the removal of grpc-health-probe is impacting scalability of Online Boutique? I doubt that the root cause is the removal of grpc-health-probe.

In my observation, there is no major change apart from removal of grpc-health-probe which might impact scalability of Online Boutique for Arm64 load testing.

How many instances/replicas of each Pod are you running?

I haven’t made any changes in the kubernetes-manifests.yaml file except docker images. So, using Replica 1 of each Pod.

ubuntu@ip-172-31-34-130:~/microservices-demo$ kubectl get deployment 

NAME                    READY   UP-TO-DATE   AVAILABLE   AGE 

adservice               1/1     1            1           19m 

cartservice             1/1     1            1           19m 

checkoutservice         1/1     1            1           19m 

currencyservice         1/1     1            1           19m 

emailservice            1/1     1            1           19m 

frontend                1/1     1            1           19m 

loadgenerator           1/1     1            1           19m 

paymentservice          1/1     1            1           19m 

productcatalogservice   1/1     1            1           19m 

recommendationservice   1/1     1            1           19m 

redis-cart              1/1     1            1           19m 

shippingservice         1/1     1            1           19m

What error message are you seeing in the logs of the various microservices? Are you, for instance, able to run kubectl logs --selector app=frontend or kubectl logs --selector app=cartservice?

Ran the command as you suggested with and without locust load, please check the logs below:

Frontend_logs: frontend logs with locust.txt frontend logs without locust.txt

Cartservice_logs: cartservice logs with locust.txt cartservice logs without locust.txt

Please let me know if you need more details for the same.

Jul 05 '23 08:07 odidev

@NimJay Could you please share your feedback regarding the above issue.

Jul 17 '23 11:07 odidev

Hi @odidev, Thank you for providing additional logs.

The error messages from your initial issue description sounds like the bottleneck might be the network (e.g., number of connections concurrently). Example error message: GET /cart ConnectionRefusedError(111, 'Connection refused') 9024

You mentioned you're deploying Online Boutique in minikube. Question:

What are the specs on the machine running minikube?
How much CPU/memory/etc. have been allocated to the minukube cluster?
You mentioned that you're using 12 Nodes. Are those 12 Nodes just used for the loadgenerator?

Another thought I have: I don't know exactly many requests-per-second each microservice is able to handle. But I find it hard to imagine that running 1 replica of each microservice will allow 2000 users — regardless of cluster size.

Jul 17 '23 22:07 NimJay

According to this article:

A Single CPU core will commonly handle an average of 220 to 250 concurrent connections simultaneously. If for instance, a website runs on a server that has a single CPU with 2 CPU cores, approximately 500 visitors may access and search the website at the same time.

Also, each Online Boutique microservice has some K8s fields similar to (see related code):

        resources:
          requests:
            cpu: 100m
            memory: 220Mi
          limits:
            cpu: 200m
            memory: 450Mi

You'll need to modify these values to support more users. Or you'll need to increase the # of replicas (per Deployment). Or set up K8s autoscaling.

Jul 17 '23 22:07 NimJay

What are the specs on the machine running minikube?

AWS instance: m7g.2xlarge has 8 vCPUs, 32 GiB of memory.

How much CPU/memory/etc. have been allocated to the minukube cluster?

CPUs=4, Memory=7800MB

You mentioned that you're using 12 Nodes. Are those 12 Nodes just used for the loadgenerator?

Yes, c7g.16xlarge for load testing using Locust with 1 master node and 11 workers nodes.

As suggested, modifying the values of requests and limits resources from kubernetes-manifests.yaml file supports more users but the limit and request modification of each microservice is different. So, I have removed the requests and limits resources parametres from kubernetes-manifests.yaml and tested the load using locust on c7g.16xlarge on single node deployment using minikube cluster. Now, I can test the load for Arm64 on m7g.2xlarge and x86_64 on m6i.2xlarge upto 6500 users without any failures.

Jul 21 '23 06:07 odidev

@odidev looks like your issue was resolved. I'll close this issue for now, but feel free to reopen.

Apr 18 '24 19:04 bourgeoisor