node-problem-detector
node-problem-detector copied to clipboard
node-problem-detector not able to detect kernel log events for a Kind cluster
I've been trying to run node-problem-detector on a local kind cluster with 3 nodes (1 master, 2 worker). And after installing it as DaemonSet, firstly I'm seeing there are three pods running across three nodes including master. And also, when I pass any Kernel message as test, I don't see any events getting generated either in npd pod nor in the node's description.
You may need to tune your daemonset yaml
- Update the node selector to ignore master, or remove the
node-problem-detectorlabel on master. https://github.com/kubernetes/node-problem-detector/blob/13b65d06e9513e82a6cad649988f33dd10f92f29/deployment/node-problem-detector.yaml#L11 - For watching kernel messages, it depends on how your NPD is configured in the daemonset. https://github.com/kubernetes/node-problem-detector/blob/13b65d06e9513e82a6cad649988f33dd10f92f29/deployment/node-problem-detector.yaml#L31
Note: kind clusters are sharing the host kernel with sketchy isolation.
What's the use case for NPD-on-kind?
Note: kind clusters are sharing the host kernel with sketchy isolation.
What's the use case for NPD-on-kind?
It's local testing and CI in my case.
For testing NPD a fake should be used or a remote VM, we shouldn't introduce issues into the CI host's kernel and if we don't then we won't see any?
for local development, you could use a VM or local-up-cluster.sh or kubeadm init
kind is generally attempting to create a container that appears like a node, but it's on a shared kernel, in a container, which kubelet doesn't clearly support.
in general kind works best for testing API interactions and node to node interactions but not kernel / host / resource limits for now unfortunately
Just in case it helps other people, the following configuration works pretty well with my KinD installation:
--config.system-log-monitor=/config/kernel-monitor.json,/config/systemd-monitor.json \
--config.custom-plugin-monitor=/config/iptables-mode-monitor.json,/config/network-problem-monitor.json,/config/kernel-monitor-counter.json,/config/systemd-monitor-counter.json
That helped me to quickly understand what's going on behind the scenes, and then deploy node-problem-detector in our clusters.