node-problem-detector
node-problem-detector copied to clipboard
Networking probes
Talked with @danwinship offline, it will be nice to be able to report
- Pod network problems
- DNS cluster problems
- Services problems
It should not be hard to provide some basic plugins that provide some basic checks, if we can figure out the best way to obtain the information needed
/cc @danwinship /assign @aojea
We currently have some pods such as cni network plug-ins, and we also want to detect its health status.
We now use custom plugins to detect whether a specific pod is alive. Perhaps HealthChecker should also support monitoring the status of a specific pod for inspection. 🤔
pods such as cni network plug-ins
why current Pod Probes are not valid?
Using pod probe can let us know whether the pod is alive, but it does not seem to convert to node condition. Because we have a custom daemonset program, we want when daemonset agents are crashed, they will not be scheduled to this node. If this feature is not provided, we need to implement similar features ourselves.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
We are also interested in these checks.
/remove-lifecycle rotten
/cc
/unassign
I'm not able to work on this soon but is a good opportunity to contribute, @aroradaman @adrianmoisey ... strawman approach
test kubernetes.default resolves, this is guaranteed to work and this will test kubernetes dns
test curl https://kubernetes.default works, this will test Services work
Hmmm... how have I never seen this project before? This looks like a fun one, tanks for the tag. /assign
(@aroradaman happy to pair with you if you want?)
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale