testsuite icon indicating copy to clipboard operation
testsuite copied to clipboard

[BUG] Review and justify the usage and requirements for the Resource Policies test.

Open denverwilliams opened this issue 3 years ago • 7 comments

Describe the bug Review the Resource Policies Kubescape test and consider how to deal with different environments that have different resources. This test seems to require a generic limit to be set which is applied to all environment no matter the size of the host system.

Vendor's are setting limits based on the environment they are deploying to and not something generic. If they set something it will be arbitrary.

Consider how to deal with different environments that have different resources. This test seems to require a generic limit to be set which is applied to all environment no matter the size of the host system.

To Reproduce Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Device (please complete the following information):

  • OS [e.g. Linux, iOS, Windows, Android]
  • Distro [e.g. Ubuntu]
  • Version [e.g. 18.04]
  • Architecture [e.g. x86, arm]
  • Browser [e.g. chrome, safari]

How will this be tested? aka Acceptance Criteria (optional)

(optional: unnecessary for things like spelling errors and such)

Once this issue is address how will the fix be verified?

Additional context Add any other context about the problem here.


NOTE: you can enable higher logging level output via the command line or env var. to help with debugging

# cmd line
./cnf-testsuite -l debug test

# make sure to use -- if running from source
crystal src/cnf-testsuite.cr -- -l debug test

# env var
LOGLEVEL=DEBUG ./cnf-testsuite test

Also setting the verbose option for many tasks will add extra output to help with debugging

crystal src/cnf-testsuite.cr test_name verbose

Check usage documentation for more info about invoking commands and logging

denverwilliams avatar Jul 07 '22 19:07 denverwilliams

Resource limits should be set for every container or a namespace to prevent resource exhaustion. This test is also relevant from a security standpoint in that it offers a level of protection from DDoS attacks, resources that have limits defined will be killed when they try to over subscribe for resources, but if limits are not defined the resources will be allowed to perpetually consume unlimited resources.

Vendor's are setting limits based on the environment they are deploying to and not something generic. If they set something it will be arbitrary.

  • This test is currently implemented by testing for the generic configuration of resources limits, it currently ignores the actual values used, simply setting a resources limit is enough to pass. But, this test is somewhat hack-able as it allow the vendor to set arbitrary limits in order to pass, but at least it provides an indicator that limits are being used. Instead of this being grounds for removing the test, it provide room for improvement, the best case would be if we only provided 'passed' the CNF if the limits are both set and non-arbitrarily, specific.

Consider how to deal with different environments that have different resources. This test seems to require a generic limit to be set which is applied to all environment no matter the size of the host system

  • Configuring Resource Limits for CNF is applied cluster wide, but this doesn't have a direct impact on the underlying environment, even when it consists of different underlying resources. This is because setting CPU & Memory limits is broadcasting the resource requirements for your CNF, which should be known in advance, and it will only affect the underlying resources available in the context of the scheduler resolving the defined limits and finding a node with the available resources for scheduling. In other words this test already handles environments with different resources.

denverwilliams avatar Jul 07 '22 21:07 denverwilliams

Another scenario to consider: CNFs that will perform better and/or accomplish more when they are able to use more resources and the host has more available resources.

Example :

  • CNF that analyzes data packets. More CPU and memory means more packets analyzed per second.
  • POD is scheduled to host X1. X1 has Y1 memory and cpu available.
  • POD is later rescheduled to host X2. X2 has Y1*2 memory and cpu available.

Ideally the CNF can use more memory have being rescheduled

If there limit is set to X2 resources available then when scheduled to X1 they will have a limit that is set to high.

It seems like the limit should not be hard coded, in all cases.

taylor avatar Jul 07 '22 21:07 taylor

Here is how I see it working declaratively to utilize available resources in a friendly and secure way.

  • CNF announces to K8s that it prefers having lots of resources available, but support scaling down to what is available

CNF communicating with K8s to ask for what it wants and use what it gets:

  • CNF: I prefer lots of memory and cpu. the most please
  • K8s: You have have this small node sorry
  • CNF: okay I can make that work
  • CNF: Oh we have more traffic. Can we scale up please?
  • K8s: sure here are 2 more small nodes
  • CNF: okay we can scale horizontal too
  • CNF: more traffic!
  • K8s: oh here is a big node with lots of cpu available
  • CNF: cool we can scale vertical too

taylor avatar Jul 07 '22 21:07 taylor

Ideally percentages could be used. Give me 10% of the memory or give me 25% of the memory as a limit.

taylor avatar Jul 07 '22 21:07 taylor

https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-resource-requests-and-limits While your Kubernetes cluster might work fine without setting resource requests and limits, you will start running into stability issues as your teams and projects grow. Adding requests and limits to your Pods and Namespaces only takes a little extra effort, and can save you from running into many headaches down the line!

https://hub.armosec.io/docs/c-0009 CPU and memory resources should have a limit set for every container or a namespace to prevent resource exhaustion

https://docs.microsoft.com/en-us/azure/defender-for-cloud/recommendations-reference

Container CPU and memory limits should be enforced Enforcing CPU and memory limits prevents resource exhaustion attacks (a form of denial of service attack). We recommend setting limits for containers to ensure the runtime prevents the container from using more than the configured resource limit.

https://portal.azure.com/#view/Microsoft_Azure_Policy/PolicyDetailBlade/definitionId/%2Fproviders%2FMicrosoft.Authorization%2FpolicyDefinitions%2Fe345eecc-fa47-480f-9e88-67dcc122b164

Kubernetes cluster containers CPU and memory resource limits should not exceed the specified limits

wvwatson avatar Sep 05 '22 20:09 wvwatson

No AC required.

agentpoyo avatar Sep 12 '22 14:09 agentpoyo

@denverwilliams @wavell @agentpoyo what is the level of effort in points for this issue (0,1,2,3,5,8,13)?

lixuna avatar Sep 19 '22 21:09 lixuna

@denverwilliams @wavell @agentpoyo what is the level of effort in points for this issue (0,1,2,3,5,8,13)?

lixuna avatar Oct 06 '22 20:10 lixuna