alert-explanations Your posts about CPU limit

I saw your posts about CPU limit. The post is good, the insight into the CPU being renewable is a great analogy to explain it and if you do not mind I will "borrow" it. I am reaching out to you with aims to hear your points agains my opinion and and I will run some further tests. Now with that said I have a bit on an issue the way the CPU limit is portrayed as somewhat evil Here and here. I mean you might be right, but my experience is that very low limits create problems, no limits is the wild west, high enough limits are good and save hardware too. Sometimes when your program is throttled it deserves to be throttled and you should check HPA, the configs of the program and other stuff. Meaning request being let say 1 CPU and limit being 2CPUs that number is almost always too low, but you can get a good formula that does not allow hoarders to take the whole CPU of the node and allows for hoarders to take anything that is not vital to everyone else when they need to somewhat hoard one day. For instance: "Markus wont let Teresa drink it because her limit is 1 liter per day so she dies of thirst." CPU is not limited to 1 litter because k8s say you say you have one litter as limit. The program Can and does still ask for CPU if it needs it; program != POD configs. It's more like limited to 1 litter if everyone else is depleting the CPU at that time, but if your program wants/needs more than your limit, you will likely have proportional to your limit compared to other deployments limits and what they need at a given time. Also death would the the wrong analogy, death would be like OOM (hitting memory limit), what actually happens is more like a pause in living/or sleep without dreams xD, precisely because CPU does not END (renewable), it's finished at a given moment in time or not, but the next clock cycle it's distributed again to the needy. With that said, yes, limits cause problems sometimes. I prefer some limits that do not compromise the nodes whole CPU, but allow for pods to go a lot over the request at a given time.
Now with that said, programs that need about more than 1/2 of your node regularly/often, should be analized and you should have if possible at all programs that consume much less than half your node. In real life Life is complicated TBH, but if the program regularly consumes 1/2 of a node, then it probably deserves a node for him self probably.

Balance in the force... is what I look for.

For instance I personally tend to use limits that are around these proportions to start with, let say I do not know anything about the progra, but in our company we load test (or should) before going to prod: resources: requests: cpu: "250m" limits: cpu: "1000m" This is 4x request vs limit, but I sometimes make it much higher depending on how the monitoring says the program behaves. This particular program uses about 80% of the request when it has some load and can consume much more sometimes, but it scales via HPA at around 80%, now it does not matter if it goes to 200% for a short time (like your example of 100m, vs 200m) while it scales and the limit allows it. This allows for certain freedom for the program and should be increased or reduced depending on the program behaviour over time, node size, etc. Of course between a low limit and no limit I would probably go with no limit to start and then adjust.

Aug 08 '22 02:08 aguzmans

Hi, apologies for the delay in responding.

Regarding your comment here:

It's more like limited to 1 litter if everyone else is depleting the CPU at that time, but if your program wants/needs more than your limit, you will likely have proportional to your limit compared to other deployments limits and what they need at a given time.

What you're describing is "soft limits" and it's obviously a desireable behaviour! To the best of my knowledge, the only way to achieve that is to set a request (despite the name, requests function as soft limits) and not to set a CPU limit. (Limits function as hard limits.) Am I missing something?

Aug 15 '22 21:08 aantn

Hi, Thanks very much for your response I appreciate the time you have put into it and the fact that you came back to me at all.

You are right about how it is portrayed and described in the docs of Linux, CFS, cgroups, Docker and Kubernetes. At the cgroups level there is the cpu.shares that is what is used for the "softlimit" we have been mentioning for requests. Then some 12 years ago the cpu.cfs_period_us and cpu.cfs_quota_us was introduced with the objective of creating the hard limit which is what is used for the K8s limits too. cpu.cfs_period_us total being the period of time which will be used to scheduling normally 100000 us (microseconds). then cpu.cfs_quota_us would set the slices of that period that the cgroup processes (normally a contianer's cgroup) can use.

Now, In practice this should work similar to explained above, but for some reazon I had a different personal observation a while back. I was probably "miss observing" something and I do not have evidence to show for that "story", it can become folklore now in my mind xD. Let's discard what I think I observed for now and go to the important part... to limit or not to limit.

I understand what you are saying about the limits and often for workloads is good. I see that there is also buzz about not using limits is social media. I understand and support some of the arguments for it, for many cases I would not use limit my self. With that said, the CFS's cpu.cfs_period_us and cpu.cfs_quota_us were introduced because they (at google) were having problems with only setting cpu.shares (equivalent to today's request and also known as proportional shares, this was done again around 12 years ago). Notice also that there are other schedulers, but todays "standard" is CFS, so I will limit my explanation to it. The people who wrote the doc at the time said: _ It should be noted that the concept of proportional shares is different from a guarantee. Assigning share to a group doesn’t guarantee that it will get a particular amount of CPU. It only means that the available CPU bandwidth will be divided as per the shares. Hence depending on the number of groups present, the actual amount of CPU time obtained by groups can vary _

Then they further say: _While for many usecases, efficient use of idle CPU cycles like this might be considered optimal, there are two key side effects that must be considered:

The actual amount of CPU time available to a group is highly variable as it is dependent on the presence and execution patterns of other groups, a machine can the not be predictably partitioned without intimately understanding the behaviors of all co-scheduled applications.
The maximum amount of CPU time available to a group is not predictable. While this is closely related to the first point, the distinction is worth_

Therefore they invented the bandwidth control system for CPU which lead to what we today know as limits in kubernetes. This is a very simplified version of the story. Read the link for more information. Now those two problems would still be present if you just set request according to my current understanding of the problem and is consistent with what I have seen so far. again not an expert, just a rlatively NEW fan boy... xD. Any thing that could hit those problems needs a limit and that is almost everything except for REALLY important apps that if they loose a few milliseconds the business goes PUFFF and I get fired. I tend to run those VIP apps in dedicated nodes. Everything else I tend to set a limit for and that limit is often very high, min about 3x to 4x request amount and even 10x+ depending on the program's behaviour load tests if they are done and the importance of the app.

Aside If you are in a shared or multi-tenant env with somewhat independent actors you need limits. Specially If costs are important and there are different teams who "share" the cluster, but must be costed separately, in this case limits are even more important.

With this all said, I am bouncing these ideas with people outside my bubble to get to better understanding and learnings, they are not strong, written on stone or I know anything for sure.

Aug 16 '22 03:08 aguzmans

alert-explanations alert-explanations copied to clipboard

Your posts about CPU limit

alert-explanations
alert-explanations copied to clipboard