node-problem-detector icon indicating copy to clipboard operation
node-problem-detector copied to clipboard

This is a place for various problem detectors running on the Kubernetes nodes.

Results 51 node-problem-detector issues
Sort by recently updated
recently updated
newest added

Hi, The health-checker-kubelet monitor runs `systemctl kill` against the kubelet service but kubelet doesn't get restarted because the exit code would be 0. EKS containerd nodes only restart kubelet on...

I've been trying to run node-problem-detector on a local kind cluster with 3 nodes (1 master, 2 worker). And after installing it as DaemonSet, firstly I'm seeing there are three...

add 'node' label to metrics exported by prometheus. The metrics obtained from Prometheus do not have node information, which prevents grouping and statistical analysis based on nodes.

cncf-cla: yes
size/M
ok-to-test

CPER is the format used to describe platform hardware error by various tables, such as ERST, BERT and HEST etc. The event severity message is printed here: https://github.com/torvalds/linux/blob/v6.7/drivers/firmware/efi/cper.c#L639 Examples are...

lgtm
cncf-cla: yes
size/S
ok-to-test

Add MartinForReal as a reviewer PR: https://github.com/kubernetes/node-problem-detector/pull/756 https://github.com/kubernetes/node-problem-detector/pull/760 https://github.com/kubernetes/node-problem-detector/pull/769 https://github.com/kubernetes/node-problem-detector/pull/773 https://github.com/kubernetes/node-problem-detector/pull/774 https://github.com/kubernetes/node-problem-detector/pull/793 https://github.com/kubernetes/node-problem-detector/pull/806 https://github.com/kubernetes/node-problem-detector/pull/820 https://github.com/kubernetes/node-problem-detector/pull/811 in review. Reviewed pr: https://github.com/kubernetes/node-problem-detector/pull/823 https://github.com/kubernetes/node-problem-detector/pull/806 https://github.com/kubernetes/node-problem-detector/pull/774 https://github.com/kubernetes/node-problem-detector/pull/761

lgtm
cncf-cla: yes
size/XS

This should allow me to create releases.

cncf-cla: yes
size/XS

The problem occurred when filesystem went to read only mode. That was fixed, but still in the metrics I was able to see the counter and gauge set up to...

Updating go.mod with latest dependencies...

lgtm
cncf-cla: yes
size/L
ok-to-test

Vulnerability scan shown a CVE for `NPD:v0.8.19` ``` NVD CVE-2023-4911 Published: 2023-10-03 - Modified: 2024-02-22 CVSS v3: 7.8 Description A buffer overflow was discovered in the GNU C Library's dynamic...

I imitated the NTPProblem rules in node-problem-detector/config/custom-plugin-monitor.json and added io delay detection rules, that is, I added two to the rules: temporary and permanent, and wrote the results of each...