kube-bench icon indicating copy to clipboard operation
kube-bench copied to clipboard

An example is missed, related to eventRecordQPS

Open winkee01 opened this issue 3 years ago • 6 comments
trafficstars

in node.yaml (e.g. this), there is a missing example related to eventRecordQPS, which makes it not consistent with other remediations.

 remediation: |
          If using a Kubelet config file, edit the file to set eventRecordQPS: to an appropriate level.
          If using command line arguments, edit the kubelet service file
          $kubeletsvc on each worker node and
          set the below parameter in KUBELET_SYSTEM_PODS_ARGS variable.
          Based on your system, restart the kubelet service. For example:
          systemctl daemon-reload
          systemctl restart kubelet.service

winkee01 avatar Mar 01 '22 10:03 winkee01

I don't know what is an appropriate level, after googling, I think eventRecordQPS should be set to 0.

winkee01 avatar Mar 01 '22 12:03 winkee01

@winkee01 Thanks for reporting.

The content of remediation was copied from the CIS Kubernetes Benchmark. And the blow parameter is the --event-qps flag.

I don't know what is an appropriate level, after googling, I think eventRecordQPS should be set to 0.

It depends on your environment and business. image -- 《CIS Kubernetes Benchmark v1.6.1》

mozillazg avatar Mar 01 '22 13:03 mozillazg

@winkee01 You should avoid 0 unless you're using the deprecated --event-qps command line.

This is confusing because when using the deprecated command line option, 0 is documented to use the default value of 5, whereas in the config file, a value of 0 is documented to disable rate limiting, which is precisely what this rule is supposed to prevent.

https://github.com/kubernetes/kubernetes/blob/master/cmd/kubelet/app/options/options.go

https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/apis/config/v1beta1/defaults.go

joebowbeer avatar May 07 '22 00:05 joebowbeer

@mozillazg Note that these --event-qps / eventRecordQPS tests are wrong.

AFAICT, the tests are looking for a 0 in a deprecated flag (--event-qps) or in a kubelet config field (eventRecordQPS), however, the default value when both of these are missing is in fact 5, but if these are present, an explicit 0 has a very different meaning for the flag than for the config field.

For the deprecated command line flag, 0 means use the default value of 5, whereas in the config file, the default value if the field is missing is 5, but an explicit value of 0 disables rate limiting, which is precisely what this rule is supposed to prevent.

joebowbeer avatar Sep 16 '22 09:09 joebowbeer

Hey @joebowbeer !

an explicit value of 0 disables rate limiting, which is precisely what this rule is supposed to prevent.

I read the CIS benchmark the other way around - losing security events due to rate-limiting is the main issue the CIS benchmark wants to prevent here, as per their introductory line: "Security relevant information should be logged", and the rationale "it is important to not restrict event creation". Rate-limiting will indeed prevent events from being created (if there's too many of them), so rate-limiting should be disabled.

The discussion about kubelet denial of service is an important caveat, but ought to be addressed by scaling up the relevant logging infrastructure rather than by setting a rate-limit here.

Does that make sense to you as well?

Pluies avatar Sep 17 '22 09:09 Pluies

@Pluies I wrote more at https://github.com/awslabs/amazon-eks-ami/pull/391 where this comment aligns with my interpretation of the original intent.

The original recommendation was to set --event-qps to 0 or to tune it to a value appropriate for your cluster.

Setting --event-qps to 0 was equivalent to not setting it at all. Both cases defaulted to 5 qps, which according to some articles is appropriate for 30 pods, but YMMV.

Then the flag was deprecated and the CIS Benchmark recommendation (and kube-bench) became twisted to check that either the --event-qps flag is 0 or that eventRecordQPS in the kubelet-config is 0, but these have two completely different meanings.

eventRecordQPS=0 is an explicit override to disable rate limiting, whereas omitting this field defaults to 5, which is equivalent to setting --event-qps to 0.

I can't claim that disabling rate limiting is necessarily wrong, but it is obvious from the source code and history of changes that disabling all rate limiting is not the intent of the recommendation, and it should not become the new effective default for a cluster.

joebowbeer avatar Sep 17 '22 10:09 joebowbeer