node-problem-detector node-problem-detector not able to detect kernel log events for a Kind cluster

node-problem-detector not able to detect kernel log events for a Kind cluster

Open pravarag opened this issue 1 year ago • 5 comments

I've been trying to run node-problem-detector on a local kind cluster with 3 nodes (1 master, 2 worker). And after installing it as DaemonSet, firstly I'm seeing there are three pods running across three nodes including master. And also, when I pass any Kernel message as test, I don't see any events getting generated either in npd pod nor in the node's description.

Feb 07 '24 04:02 pravarag

You may need to tune your daemonset yaml

Update the node selector to ignore master, or remove the node-problem-detector label on master. https://github.com/kubernetes/node-problem-detector/blob/13b65d06e9513e82a6cad649988f33dd10f92f29/deployment/node-problem-detector.yaml#L11
For watching kernel messages, it depends on how your NPD is configured in the daemonset. https://github.com/kubernetes/node-problem-detector/blob/13b65d06e9513e82a6cad649988f33dd10f92f29/deployment/node-problem-detector.yaml#L31

Apr 05 '24 16:04 wangzhen127

Note: kind clusters are sharing the host kernel with sketchy isolation.

What's the use case for NPD-on-kind?

Jun 26 '24 20:06 BenTheElder

Note: kind clusters are sharing the host kernel with sketchy isolation.

What's the use case for NPD-on-kind?

It's local testing and CI in my case.

Sep 17 '24 07:09 cmontemuino

For testing NPD a fake should be used or a remote VM, we shouldn't introduce issues into the CI host's kernel and if we don't then we won't see any?

for local development, you could use a VM or local-up-cluster.sh or kubeadm init

kind is generally attempting to create a container that appears like a node, but it's on a shared kernel, in a container, which kubelet doesn't clearly support.

in general kind works best for testing API interactions and node to node interactions but not kernel / host / resource limits for now unfortunately

Sep 17 '24 15:09 BenTheElder

Just in case it helps other people, the following configuration works pretty well with my KinD installation:

--config.system-log-monitor=/config/kernel-monitor.json,/config/systemd-monitor.json \
--config.custom-plugin-monitor=/config/iptables-mode-monitor.json,/config/network-problem-monitor.json,/config/kernel-monitor-counter.json,/config/systemd-monitor-counter.json

That helped me to quickly understand what's going on behind the scenes, and then deploy node-problem-detector in our clusters.

Sep 18 '24 06:09 cmontemuino

node-problem-detector node-problem-detector copied to clipboard

node-problem-detector not able to detect kernel log events for a Kind cluster

node-problem-detector
node-problem-detector copied to clipboard