cf-for-k8s
cf-for-k8s copied to clipboard
Istio-Proxy sidecar resource requirements overly excessive
Describe the bug
The resource requirements configured for the Istio-Proxy sidecar for app instances seems to be a rather excessive, especially when compared to pushing a small golang app for example. In our environment this has caused apps to be unschedulable due to resource constraints from the K8s scheduler/nodes.
To Reproduce*
Steps to reproduce the behavior:
-
cf push my-small-golang-app -m 16m
-
kubectl -n cf-workloads describe pod/<app-instance-pod>
opi:
Limits:
ephemeral-storage: 64M
memory: 16M
Requests:
cpu: 10m
ephemeral-storage: 64M
memory: 16M
istio-proxy:
Limits:
cpu: 2
memory: 1Gi
Requests:
cpu: 100m
memory: 128Mi
Expected behavior
Istio-Proxy should not have such excessive resource requests/limits set, when compared to an app that itself only requests 10m,16Mi itself.
cf-for-k8s SHA
https://github.com/cloudfoundry/cf-for-k8s/tree/7c65597af7a4de935994813658a5db182fbecac9
Cluster information
PKS
We have created an issue in Pivotal Tracker to manage this:
https://www.pivotaltracker.com/story/show/174710760
The labels on this github issue will be updated when the story is started.
Hello @JamesClonk. Thanks for raising this issue.
We understand that you might have a small TKGI (PKS) cluster and that you only deploy apps with low memory usage, however, the sidecar proxies memory usage doesn't correlate with the memory usage of the app itself but rather traffic going to and from the app. So we cannot change that number based on the memory usage of the app.
Also, just to remind you that right now cf-for-k8s has some minimal system requirements:
To deploy cf-for-k8s as is, the cluster should:
- be running Kubernetes version within range 1.16.x to 1.18.x
- have a minimum of 5 nodes
- have a minimum of 4 CPU, 15GB memory per node
You can read more about it in the deployment guide.
cc @kauana
Thanks @JamesClonk for submitting this issue and to @mike1808 and @kauana for your response.
Mike and Kauana, we have a couple of questions for you:
- is this networking story related? Platform operators can configure Istio component resource properties
- Would you recommend we keep this issue open for now or that we close it?
Hi @jamespollard8
- Yes, it's related. We're going to allow operators to modify Istio resource request/limits.
- Yes, let's keep this open and mark as a known issue.
I have some doubts that allowing the platform operator to configure resource requirements for sidecars globally will solve the. At least unless we have a foundation that is only hosting apps with very similar network traffic.
Has any conceptual work been started of how we can scale the Envoy according to the applications needs? I am aware that this will be far from trivial to solve and might even require work in kubernetes (first class sidecar) or istio (there at least had been ideas about how to decouple envoy from the application pods). Just curious if there have been any thoughts on this in the cf-k8s-networking team.
Hello @loewenstein
We personally didn't perform any tests to validate resource requirements for sidecars and for now we're going to rely on the numbers from Istio documentaiton.
- The Envoy proxy uses 0.5 vCPU and 50 MB memory per 1000 requests per second going through the proxy.
- Istiod uses 1 vCPU and 1.5 GB of memory.
- The Envoy proxy adds 2.76 ms to the 90th percentile latency.
I was just saying that "per 1000 requests" will not make for an easy platform wide configuration.
But I do understand that we currently don't have much of an option.
@loewenstein we are going to make a doc with our recommendation (based on our testing). However, it is not prioritized right now.
The workaround would be to manually override the envoy memory requirements in your pod template:
metadata:
annotations:
sidecar.istio.io/proxyCPU: "100m"
sidecar.istio.io/proxyCPULimit: "1000m"
sidecar.istio.io/proxyMemory: "1Gi"
sidecar.istio.io/proxyMemoryLimit: "2Gi"
Hi @mike1808 , we are in a similar situation trying to investigate the right compute resources allocation to the envoy proxy sidecar, as the workloads increase, it's taking the cluster into the overcommitted state. Are there any recommendations, you have published in regards to resource allocation?