compliantkubernetes-apps icon indicating copy to clipboard operation
compliantkubernetes-apps copied to clipboard

Apply chaos engineering on node-local-dns

Open aarnq opened this issue 1 year ago • 1 comments

Concept to investigate

We've seen that node-local-dns can be highly impacted when there are network issues, and in its current state it can easily start to use a lot of memory that may have knock on effects causing nodes to go OOM. (From ~20MiB to over 1GiB has been seen in the wild.)

So, we should apply some chaos engineering and test different configurations that can better handle network issues better.

Artefacts to capture

Updated node-local-dns configuration based on chaos experiments.

Additional context

No response

aarnq avatar Jan 12 '24 14:01 aarnq

Would also be worth investigating if lowering the max_concurrent setting in coredns has any effect on the node-local-dns usage spikes.

davidumea avatar Feb 20 '24 15:02 davidumea