node-problem-detector
node-problem-detector copied to clipboard
This is a place for various problem detectors running on the Kubernetes nodes.
The current NPD architecture in the logMonitor is that the `pluginConfig.message` regex is used to capture a string which is included in the node condition or Event if a fault...
https://testgrid.k8s.io/presubmits-node-problem-detector#pull-npd-e2e-test starts to fail recently. ``` [1] NPD should export Prometheus metrics. When OOM kills and docker hung happen [1] NPD should update problem_counter and problem_gauge [1] /home/prow/go/src/k8s.io/node-problem-detector/test/e2e/metriconly/metrics_test.go:158 [2] error...
this PR adds a revival mechanism for a recurring issue im facing. in kubernetes, when a node is under significant load, the connection to /dev/kmsg can be closed unexpectedly. instead...
this PR adds two flags, 1. enable deprecated condition type deletion (bool, defaults to false) 2. a CSV-string of condition type names to delete (string, ignored if 1 is false)...
I am planning to perform manual Kubernetes testing on the `node-problem-detector` image (installed via Helm) to simulate node issues, similar to `problem-maker`. However, if I understand correctly, `problem-maker` cannot be...
another example of potential internal application health enhancements to satisfy #1006
[ci-npd-build](https://testgrid.k8s.io/sig-node-node-problem-detector#ci-npd-build&width=20) has been flaky since 1/8/2025, with only 40% passing rate. ``` #26 [linux/arm64 builder 5/5] RUN GOARCH=arm64 make bin/node-problem-detector bin/health-checker bin/log-counter #26 289.5 runtime/cgo: aarch64-linux-gnu-gcc: signal: segmentation fault (core...
OenCensus has been deprecated. We should migrate to OpenTelemetry. See https://opentelemetry.io/blog/2023/sunsetting-opencensus/
would like to start a conversation around revival of closed kmsg channels. https://github.com/kubernetes/node-problem-detector/pull/1004 supplies a recovery mechanism which, when configured in the plugin's config, will revive a closed kmsg channel....