compliantkubernetes-apps
compliantkubernetes-apps copied to clipboard
Apply chaos engineering on node-local-dns
Concept to investigate
We've seen that node-local-dns can be highly impacted when there are network issues, and in its current state it can easily start to use a lot of memory that may have knock on effects causing nodes to go OOM. (From ~20MiB to over 1GiB has been seen in the wild.)
So, we should apply some chaos engineering and test different configurations that can better handle network issues better.
Artefacts to capture
Updated node-local-dns configuration based on chaos experiments.
Additional context
No response
Would also be worth investigating if lowering the max_concurrent
setting in coredns has any effect on the node-local-dns usage spikes.