cdk8s-plus icon indicating copy to clipboard operation
cdk8s-plus copied to clipboard

Provide mechanism to allow containers to run without CPU limits

Open otterley opened this issue 1 year ago • 7 comments

Description of the feature or enhancement:

Provide a mechanism to allow containers to run without CPU or memory limits. Currently a default limit is applied to all containers in a Pod as a result of changes made in #1082 . You can specify higher limits, but you can't remove the limits altogether. I also recommend making "unlimited" the default for CPU.

Use Case:

Running containers with limits can, in certain causes, cause a nontrivial amount of time to be spent in the kernel maintaining that limit, due to bugs in older Linux kernels [1][2]. This typically happens with very large containers, e.g., 32 vCPU or higher, leading to decreased overall workload performance. The bugs have since been patched in newer kernel versions (in both 4.x and 5.x), but many customers may be unaware that they are running an impacted kernel.

Moreover, forcing a container to run with limits means it cannot take advantage of free CPU cycles to "burst" beyond its usual resource requirements. This leads to underutilized nodes in many cases, leading to economic waste and higher operational costs than otherwise might be required for a given workload.

A number of certain well-intended but misinformed "best practice" material and evaluation software state that CPU limits are the correct and proper way to prevent a container from becoming a "noisy neighbor" and impacting the performance of other containers running on a node [3][4]. However, this is is incorrect. Resource requests provide the same feature, but in reverse: If a "bursty" container is using more than its requested CPU, the Linux kernel will automatically throttle it back to its requested CPU allocation if another container needs access to the CPU resource (up to the latter's own requested value). That's why requests are also known as guarantees: they ensure the container can always get access to the resource, even if it's currently being used by something else. They set a floor, not a ceiling. And since every container is guaranteed the amount of CPU that it requests, limits aren't necessary to make that effective.

See also https://home.robusta.dev/blog/stop-using-cpu-limits for more background.

Proposed Solution:

~Add a Cpu.UNLIMITED constant to cdk8s-plus that removes the CPU limit.~ Remove the default CPU limit so that container performance is not unnecessarily throttled.

Other:

[1] https://static.sched.com/hosted_files/kccncna19/dd/Kubecon_%20Throttling.pdf [2] https://github.com/kubernetes/kubernetes/issues/67577 [3] https://github.com/zegl/kube-score/blob/master/README_CHECKS.md [4] https://polaris.docs.fairwinds.com/checks/efficiency/

  • [ ] :wave: I may be able to implement this feature request
  • [ ] :warning: This feature might incur a breaking change

This is a :rocket: Feature Request

otterley avatar Feb 23 '23 02:02 otterley