automaxprocs icon indicating copy to clipboard operation
automaxprocs copied to clipboard

What happens if cpu quota on k8s side is less than 1 core?

Open timo-klarshift opened this issue 2 years ago • 7 comments

Am i right that if you have a service with a cpu limit less than 1000m (eg 1 core) the go process would still think it has one core available and tries to utilize more than its limit to get throttled eventually. So my understanding here is that if you do use limits less than one core is a very unideal situation for the go-scheduler to work efficently:

a.) 4 replicas with 500m cpu limit b.) 2 replicas with 1000m cpu limit

In total both cases use the same amount of total cores (2) but case b.) would be more efficient as the go scheduler knows how much it can utilize?

Sorry that i created a bug ticket for my simple question. But i think if my assumptions are correct, it would be good to make this clear in the README.

Thanks for the awesome library :+1:

timo-klarshift avatar Apr 14 '22 12:04 timo-klarshift

I'm current exploring exactly this case:

  • k8s nodes are EC2 m5.large (2 vCPU)
  • the PODs has the limit 500m or 0.5

My current observations go as following: with this setup, the default value of GOMAXPROCS will be the number of CPU. Due to the CPU limit, the process will be throttled, if the app spends up to total 50ms on the CPU per CFS quota:

limit * 1000m / cpu_quota_us = 0.5 / 1000m * 100ms = 50ms

Since CFS quota is calculated across all threads of the app, in the worse case, the app with GOMAXPROCS=2 will be throttled, after both threads spend 50 / 2 = 25ms per 100ms period on the CPU.

With that, in case where CPU limit is lower than the number of host's CPU cores, GOMAXPROCS=1 can show more stable tail-latency.

narqo avatar May 04 '22 07:05 narqo

Yes, from what I've seen, any fractional quotas don't play well with Go as it attempts to consume an integer amount of cores (even if the application doesn't consume CPU, eventually the GC will consume available CPU up to GOMAXPROCS). This can and does lead to throttling. With a 0.5 quota and GOMAXPROCS=1, the process could end up consuming all of the available quota in the first half of the slice, so it's possible to get throttled for the cfs_period (default 100ms, hence 50ms throttling as @narqo mentioned above).

prashantv avatar May 09 '22 05:05 prashantv

Okay, thanks for the confirmation. We ended up with at least providing one core per replica to avoid above situation.

timo-klarshift avatar May 09 '22 07:05 timo-klarshift

@timo-klarshift When you say cpu quota, which one do you mean in k8s: limits.cpu or requests.cpu?

rukolahasser avatar May 19 '22 04:05 rukolahasser

When you say cpu quota, which one do you mean in k8s

I expect this was about limits.cpu. requests.cpu is for scheduling and for weighting the workload, but I'm not sure how the latter translates into throttling on the app's side

narqo avatar May 19 '22 21:05 narqo

When you say cpu quota, which one do you mean in k8s

I expect this was about limits.cpu. requests.cpu is for scheduling and for weighting the workload, but I'm not sure how the latter translates into throttling on the app's side

Gotcha, I tested it myself and it should be limits.cpu

rukolahasser avatar May 20 '22 07:05 rukolahasser

When you say cpu quota, which one do you mean in k8s

I expect this was about limits.cpu. requests.cpu is for scheduling and for weighting the workload, but I'm not sure how the latter translates into throttling on the app's side

requests never lead to throttling. In the cgroup v1 implementation in the Kubelet (well, Kernel), requests are a guarantee of weighted time slices on the host (across all available cores). See https://www.youtube.com/watch?v=8-apJyr2gi0 for details.

embano1 avatar Apr 29 '23 18:04 embano1