node-problem-detector Networking probes

Talked with @danwinship offline, it will be nice to be able to report

Pod network problems
DNS cluster problems
Services problems

It should not be hard to provide some basic plugins that provide some basic checks, if we can figure out the best way to obtain the information needed

Nov 26 '24 21:11 aojea

/cc @danwinship /assign @aojea

Nov 26 '24 21:11 aojea

We currently have some pods such as cni network plug-ins, and we also want to detect its health status.

We now use custom plugins to detect whether a specific pod is alive. Perhaps HealthChecker should also support monitoring the status of a specific pod for inspection. 🤔

Dec 26 '24 09:12 googs1025

pods such as cni network plug-ins

why current Pod Probes are not valid?

Dec 29 '24 15:12 aojea

Using pod probe can let us know whether the pod is alive, but it does not seem to convert to node condition. Because we have a custom daemonset program, we want when daemonset agents are crashed, they will not be scheduled to this node. If this feature is not provided, we need to implement similar features ourselves.

Jan 04 '25 14:01 googs1025

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Apr 04 '25 15:04 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

May 04 '25 15:05 k8s-triage-robot

We are also interested in these checks.

May 23 '25 12:05 pbol-samedi

/remove-lifecycle rotten

May 23 '25 12:05 pbol-samedi

/cc

Jul 17 '25 22:07 ajaysundark

/unassign

I'm not able to work on this soon but is a good opportunity to contribute, @aroradaman @adrianmoisey ... strawman approach

test kubernetes.default resolves, this is guaranteed to work and this will test kubernetes dns

test curl https://kubernetes.default works, this will test Services work

Jul 20 '25 17:07 aojea

Hmmm... how have I never seen this project before? This looks like a fun one, tanks for the tag. /assign

(@aroradaman happy to pair with you if you want?)

Jul 20 '25 18:07 adrianmoisey

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Oct 18 '25 18:10 k8s-triage-robot

/remove-lifecycle stale

Oct 27 '25 15:10 adrianmoisey

node-problem-detector node-problem-detector copied to clipboard

Networking probes

node-problem-detector
node-problem-detector copied to clipboard