node-problem-detector
node-problem-detector copied to clipboard
This is a place for various problem detectors running on the Kubernetes nodes.
Hi, The health-checker-kubelet monitor runs `systemctl kill` against the kubelet service but kubelet doesn't get restarted because the exit code would be 0. EKS containerd nodes only restart kubelet on...
I've been trying to run node-problem-detector on a local kind cluster with 3 nodes (1 master, 2 worker). And after installing it as DaemonSet, firstly I'm seeing there are three...
add 'node' label to metrics exported by prometheus. The metrics obtained from Prometheus do not have node information, which prevents grouping and statistical analysis based on nodes.
CPER is the format used to describe platform hardware error by various tables, such as ERST, BERT and HEST etc. The event severity message is printed here: https://github.com/torvalds/linux/blob/v6.7/drivers/firmware/efi/cper.c#L639 Examples are...
Add MartinForReal as a reviewer PR: https://github.com/kubernetes/node-problem-detector/pull/756 https://github.com/kubernetes/node-problem-detector/pull/760 https://github.com/kubernetes/node-problem-detector/pull/769 https://github.com/kubernetes/node-problem-detector/pull/773 https://github.com/kubernetes/node-problem-detector/pull/774 https://github.com/kubernetes/node-problem-detector/pull/793 https://github.com/kubernetes/node-problem-detector/pull/806 https://github.com/kubernetes/node-problem-detector/pull/820 https://github.com/kubernetes/node-problem-detector/pull/811 in review. Reviewed pr: https://github.com/kubernetes/node-problem-detector/pull/823 https://github.com/kubernetes/node-problem-detector/pull/806 https://github.com/kubernetes/node-problem-detector/pull/774 https://github.com/kubernetes/node-problem-detector/pull/761
The problem occurred when filesystem went to read only mode. That was fixed, but still in the metrics I was able to see the counter and gauge set up to...
Updating go.mod with latest dependencies...
Vulnerability scan shown a CVE for `NPD:v0.8.19` ``` NVD CVE-2023-4911 Published: 2023-10-03 - Modified: 2024-02-22 CVSS v3: 7.8 Description A buffer overflow was discovered in the GNU C Library's dynamic...
I imitated the NTPProblem rules in node-problem-detector/config/custom-plugin-monitor.json and added io delay detection rules, that is, I added two to the rules: temporary and permanent, and wrote the results of each...