node-problem-detector icon indicating copy to clipboard operation
node-problem-detector copied to clipboard

Networking probes

Open aojea opened this issue 1 year ago • 4 comments

Talked with @danwinship offline, it will be nice to be able to report

  • Pod network problems
  • DNS cluster problems
  • Services problems

It should not be hard to provide some basic plugins that provide some basic checks, if we can figure out the best way to obtain the information needed

aojea avatar Nov 26 '24 21:11 aojea

/cc @danwinship /assign @aojea

aojea avatar Nov 26 '24 21:11 aojea

We currently have some pods such as cni network plug-ins, and we also want to detect its health status.

We now use custom plugins to detect whether a specific pod is alive. Perhaps HealthChecker should also support monitoring the status of a specific pod for inspection. 🤔

googs1025 avatar Dec 26 '24 09:12 googs1025

pods such as cni network plug-ins

why current Pod Probes are not valid?

aojea avatar Dec 29 '24 15:12 aojea

Using pod probe can let us know whether the pod is alive, but it does not seem to convert to node condition. Because we have a custom daemonset program, we want when daemonset agents are crashed, they will not be scheduled to this node. If this feature is not provided, we need to implement similar features ourselves.

googs1025 avatar Jan 04 '25 14:01 googs1025

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Apr 04 '25 15:04 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar May 04 '25 15:05 k8s-triage-robot

We are also interested in these checks.

pbol-samedi avatar May 23 '25 12:05 pbol-samedi

/remove-lifecycle rotten

pbol-samedi avatar May 23 '25 12:05 pbol-samedi

/cc

ajaysundark avatar Jul 17 '25 22:07 ajaysundark

/unassign

I'm not able to work on this soon but is a good opportunity to contribute, @aroradaman @adrianmoisey ... strawman approach

test kubernetes.default resolves, this is guaranteed to work and this will test kubernetes dns

test curl https://kubernetes.default works, this will test Services work

aojea avatar Jul 20 '25 17:07 aojea

Hmmm... how have I never seen this project before? This looks like a fun one, tanks for the tag. /assign

(@aroradaman happy to pair with you if you want?)

adrianmoisey avatar Jul 20 '25 18:07 adrianmoisey

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Oct 18 '25 18:10 k8s-triage-robot

/remove-lifecycle stale

adrianmoisey avatar Oct 27 '25 15:10 adrianmoisey