linkerd2 icon indicating copy to clipboard operation
linkerd2 copied to clipboard

The linkerd-init initContainer CPU request is too high

Open cparadal opened this issue 1 year ago • 5 comments

What is the issue?

The configured CPU request for linkerd-init is currently set to 100m, which is way too high, for what that container does.

The configured CPU request for linkerd-init was increased last year from 10m to 100m in what it looks like an attempt to respect QoS guaranteed in meshed pods. As the linkerd-init only job appears to be setting a few IPTables rules and exit, it shouldn't need such a high request.

Depending on the cluster size & number of meshed pods, the above can cause a cluster node count to grow exponentially, with the associated additional monetary cost that comes with it. This is affected by the way initContainer resources work.

We lowered this (request/limit) to 10m/10m in our environments and haven't seen any issues.

How can it be reproduced?

When a substantial amount of pods are meshed, the cluster node count will increase exponentially, due to the extra request burden imposed by the linkerd-init/linkerd-proxy injected containers.

Logs, error output, etc

N/A

output of linkerd check -o short

N/A

Environment

  • Kubernetes version: 1.26
  • Cluster environment: kubeadm-deployed clusters on AWS (+ ClusterAutoscaler)
  • Host OS: Linux
  • Linkerd version: stable-2.13.6

Possible solution

Decrease CPU request for linkerd-init initContainers.

Additional context

No response

Would you like to work on fixing this bug?

maybe

cparadal avatar Aug 31 '23 11:08 cparadal

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Dec 01 '23 00:12 stale[bot]

Thanks for reporting this. I agree this is a problem as it increases costs. We cannot change the request back to a value that does not match limits without having second order consequences elsewhere. I guess the key question here is does the limit need to be as high as 100m. I personally think it makes sense to lower this value (i.e. I would have advocated for the reverse change to set both request and limit to the lower value of 10ms) as the only side effect would be slower startup time - which could always be fixed by configuring a higher limit. I guess it comes down to a judgement call on what is preferred by default - optimal startup time or minimal resource usage across a cluster.

I'll leave this open to see if anyone else feels strongly about this.

DavidMcLaughlin avatar Dec 04 '23 18:12 DavidMcLaughlin

We've have an issue too with this 100m CPU request. We have a preview cluster that has the same number of pods than production cluster, but with no traffic on it, so pods are not consuming anything. We set CPU request to a very low value in that case to make the preview run on a low budget cluster. However, due to linkerd having a 100m cpu request on init container, we are hitting the limit of CPU request while our cluster is using only 15% of its CPU.

I digged and found that this was changed after #7980 . The motivation is unclear. Maybe @mikebell90 can provide more information on what was the issue and why setting a 100m CPU request was solving it. What would be the consequence of reverting to 10m CPU request? I also ping @kleimkuhler who was involved in this issue.

I'm trying to find a workaround before any change is released in Linkerd. In #7980 we see that we can change the value during deployment of linkerd. Is there a way to change it afterward?

couloum avatar Mar 20 '24 16:03 couloum

You should never have unbounded or unset requests, or the node cannot fulfill the requests. Simple as that. If you feel like living semi-dangerously, override the defaults. But my opinion is a) its good to have conservative requests, and good to have requests and limits set and equal as a default. The annotations assumedly still work and make changing (assuming you have a standard deployment template) pretty easy.

But sadly I no longer use linkerd so I'm happy with whatever community decides

mikebell90 avatar Mar 20 '24 16:03 mikebell90

I think you guys are asking for it to be set to 10m/10m. That's fine. I think if memory serves it was originally request 10m/limit 100m, and I merely asked them to be identical -- or the WHOLE POD loses QOS , which is annoying. If you prefer 10m/10m, all good

mikebell90 avatar Mar 20 '24 16:03 mikebell90

This does feel, a bit high?. 100m is basically 1/10th of a CPU iirc?.

This gets into an interesting state when playing around. I have a dev k3s I spun up which I apparently gave 1 cpu in Proxmox. So its sitting at 10% usage, but you can't provision anything else now, as I installed Linkerd and it's visualisation component, which demand 80% of a core, and coredns and metrics demand the remaining.

Even if the control nodes need this much? Does the visualisation component require entirely half of a core to run?. It takes more than Linkerd itself!

Namespace                   Name                                       CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                       ------------  ----------  ---------------  -------------  ---
  kube-system                 local-path-provisioner-6c86858495-p4bdg    0 (0%)        0 (0%)      0 (0%)           0 (0%)         144m
  kube-system                 svclb-traefik-7fbd863e-fsckf               0 (0%)        0 (0%)      0 (0%)           0 (0%)         144m
  kube-system                 coredns-6799fbcd5-w7zxl                    100m (10%)    0 (0%)      70Mi (1%)        170Mi (4%)     144m
  kube-system                 metrics-server-54fd9b65b-s6s25             100m (10%)    0 (0%)      70Mi (1%)        0 (0%)         144m
  kube-system                 traefik-7d5f6474df-jhxrn                   0 (0%)        0 (0%)      0 (0%)           0 (0%)         144m
  linkerd                     linkerd-identity-588dcf4dd5-tcntd          100m (10%)    100m (10%)  20Mi (0%)        20Mi (0%)      74m
  linkerd                     linkerd-proxy-injector-74c4d66465-bvjmq    100m (10%)    100m (10%)  20Mi (0%)        20Mi (0%)      74m
  linkerd                     linkerd-destination-c7b69f586-knms2        100m (10%)    100m (10%)  20Mi (0%)        20Mi (0%)      74m
  linkerd-viz                 metrics-api-6779cc6c8d-wsg48               100m (10%)    100m (10%)  20Mi (0%)        20Mi (0%)      71m
  linkerd-viz                 web-7f9fdc49fb-6vzhm                       100m (10%)    100m (10%)  20Mi (0%)        20Mi (0%)      71m
  linkerd-viz                 tap-656c58cddf-9f258                       100m (10%)    100m (10%)  20Mi (0%)        20Mi (0%)      71m
  linkerd-viz                 tap-injector-6b56c7786b-jt2bk              100m (10%)    100m (10%)  20Mi (0%)        20Mi (0%)      71m
  linkerd-viz                 prometheus-847fc8fdb6-g27v4                100m (10%)    100m (10%)  20Mi (0%)        20Mi (0%)      71m

I realise this might now be a different topic, but it is a similar vibe that the cpu usage items do seem excessive across the board

btrepp avatar May 25 '24 08:05 btrepp

Any reason not to just set this to 10m/10m and be done with it?

wmorgan avatar Jun 04 '24 16:06 wmorgan

Any reason not to just set this to 10m/10m and be done with it?

I can't think of any. I agree with the rest of the people in the thread that 100m is a bit excessive. I think it's reasonable not to expect proxy-init to take up a 10th of a CPU to call into a binary. It seems that in https://github.com/linkerd/linkerd2/pull/7989 we just opted to be conservative. Instead of lowering the limit, we raised the amount requested.

Raising the requests—instead of lowering the limits—felt like the safer option here. This means that the container will now always be guaranteed these amounts and will never use more.

10m/10m should be fine, seems OP ran with the same configuration and didn't encounter any problems. Also, it still maintains the qos class so won't have any ub when upgrading.

@btrepp

I realise this might now be a different topic, but it is a similar vibe that the cpu usage items do seem excessive across the board

Hm, unless I'm misunderstanding something it's the same topic, so you're fine! :)

The viz stack itself doesn't really set any default resource requests / limits afaict (e.g. see metrics-api). The culprit is none other than proxy-init requiring half a core just to schedule the entire stack. I agree this is not ideal.

mateiidavid avatar Jun 10 '24 15:06 mateiidavid

Thanks all! We've now marked this as completed, expect the next edge to contain the necessary change.

We experimented with a couple of values to see what the effects would be. We found that on average, the init container consumed around 50m CPU units. In certain cases, the container used more than 50m. To protect against throttling (which would slow down the rollout of any injected workload) we decided to take a different approach.

Instead of prescribing a value that all environments should use, we default to the proxy's resources.

The highest of any particular resource request or limit defined on all init containers is the effective init request/limit. If any resource has no resource limit specified this is considered as the highest limit. The Pod's effective request/limit for a resource is the higher of:

the sum of all app containers request/limit for a resource the effective init request/limit for a resource

The init container will request the same amount of memory and CPU as the proxy. We will allocate those resources anyway for the proxy to use them, and since the init container runs first, the resources will be released to the proxy after. This holds even when we use the native sidecar feature since init containers are sequentially executed.

To ensure pods are Guaranteed, clusters need to specify proxy requirements anyway. By default, init containers will not request any resources unless proxy resources are specified in the values file. This should fix scheduling constraints for smaller environments.

I think ultimately this is going to fix everyone's problem and remove some configuration knobs that are hard to get right. If anyone has any questions or concerns, let us know.

mateiidavid avatar Jun 24 '24 11:06 mateiidavid