Load testing using Locust failing for Arm64 Platform
Hi @NimJay
Describe the bug
I tested the Hipster Shop performance on AWS, GCP and Azure instances using Locust on a minikube cluster. Earlier I built the Docker images for Arm64 with the grpc-health-probe binary used in the Dockerfile and tested the same by deploying the Hipster Shop or performing testing using Locust. For Arm64 I can test the load up to 6500 users without any failures but for x86_64 I was getting failures from 2000 users.
Now the grpc-health-probe binary is no longer necessary because in Kubernetes-1.24 the gRPC health-check probes functionality is built into Kubernetes. I've rebuilt the Arm64 Docker images without the grpc-health-probe binary and updated the kubernetes-manifests.yaml with the images I built. Deployment works successfully, and the UI is accessible. However, I'm getting failures during load testing using Locust due to port forwarding failing to handle requests from 2000 users on both Arm64 and x86_64 architectures.
Due to the latest changes, I am blocked. Could you please share some pointers for the same?
To Reproduce
Steps to reproduce the behavior: Tested the locust load by taking a master node and 11 worker nodes to generate the load perfectly by deploying the app on a minikube cluster.
- Ran command
locust -f locustfile.py --masterandlocust -f locustfile.py --worker --master-host=localhost
Logs
load test result at 2000 user.pdf
Screenshots
During load testing port forwarding failing to handle requests from 2000 or more users
Environment
- OS: Ubuntu 22.04.2 LTS
- Kubernetes distribution, version: minikube
- Any relevant tool version: Locust 2.15.1
Hi @odidev, Thanks for reporting this issue — your description is very clear and thorough. :)
- Are you suggesting that the removal of
grpc-health-probeis impacting scalability of Online Boutique? I doubt that the root cause is the removal ofgrpc-health-probe. - How many instances/replicas of each Pod are you running?
- What error message are you seeing in the logs of the various microservices? Are you, for instance, able to run
kubectl logs --selector app=frontendorkubectl logs --selector app=cartservice? - This demo app (Online Boutique) doesn't make any promises about how many users it supports, so I will lower the priority on this. Hope that's okay.
- Are you suggesting that the removal of grpc-health-probe is impacting scalability of Online Boutique? I doubt that the root cause is the removal of grpc-health-probe.
In my observation, there is no major change apart from removal of grpc-health-probe which might impact scalability of Online Boutique for Arm64 load testing.
- How many instances/replicas of each Pod are you running?
I haven’t made any changes in the kubernetes-manifests.yaml file except docker images. So, using Replica 1 of each Pod.
ubuntu@ip-172-31-34-130:~/microservices-demo$ kubectl get deployment
NAME READY UP-TO-DATE AVAILABLE AGE
adservice 1/1 1 1 19m
cartservice 1/1 1 1 19m
checkoutservice 1/1 1 1 19m
currencyservice 1/1 1 1 19m
emailservice 1/1 1 1 19m
frontend 1/1 1 1 19m
loadgenerator 1/1 1 1 19m
paymentservice 1/1 1 1 19m
productcatalogservice 1/1 1 1 19m
recommendationservice 1/1 1 1 19m
redis-cart 1/1 1 1 19m
shippingservice 1/1 1 1 19m
- What error message are you seeing in the logs of the various microservices? Are you, for instance, able to run
kubectl logs --selector app=frontendorkubectl logs --selector app=cartservice?
Ran the command as you suggested with and without locust load, please check the logs below:
Frontend_logs: frontend logs with locust.txt frontend logs without locust.txt
Cartservice_logs: cartservice logs with locust.txt cartservice logs without locust.txt
Please let me know if you need more details for the same.
@NimJay Could you please share your feedback regarding the above issue.
Hi @odidev, Thank you for providing additional logs.
The error messages from your initial issue description sounds like the bottleneck might be the network (e.g., number of connections concurrently).
Example error message:
GET /cart ConnectionRefusedError(111, 'Connection refused') 9024
You mentioned you're deploying Online Boutique in minikube. Question:
- What are the specs on the machine running minikube?
- How much CPU/memory/etc. have been allocated to the minukube cluster?
- You mentioned that you're using 12 Nodes. Are those 12 Nodes just used for the loadgenerator?
Another thought I have: I don't know exactly many requests-per-second each microservice is able to handle. But I find it hard to imagine that running 1 replica of each microservice will allow 2000 users — regardless of cluster size.
According to this article:
A Single CPU core will commonly handle an average of 220 to 250 concurrent connections simultaneously. If for instance, a website runs on a server that has a single CPU with 2 CPU cores, approximately 500 visitors may access and search the website at the same time.
Also, each Online Boutique microservice has some K8s fields similar to (see related code):
resources:
requests:
cpu: 100m
memory: 220Mi
limits:
cpu: 200m
memory: 450Mi
You'll need to modify these values to support more users. Or you'll need to increase the # of replicas (per Deployment). Or set up K8s autoscaling.
What are the specs on the machine running minikube?
AWS instance: m7g.2xlarge has 8 vCPUs, 32 GiB of memory.
How much CPU/memory/etc. have been allocated to the minukube cluster?
CPUs=4, Memory=7800MB
You mentioned that you're using 12 Nodes. Are those 12 Nodes just used for the loadgenerator?
Yes, c7g.16xlarge for load testing using Locust with 1 master node and 11 workers nodes.
As suggested, modifying the values of requests and limits resources from kubernetes-manifests.yaml file supports more users but the limit and request modification of each microservice is different. So, I have removed the requests and limits resources parametres from kubernetes-manifests.yaml and tested the load using locust on c7g.16xlarge on single node deployment using minikube cluster. Now, I can test the load for Arm64 on m7g.2xlarge and x86_64 on m6i.2xlarge upto 6500 users without any failures.
@odidev looks like your issue was resolved. I'll close this issue for now, but feel free to reopen.