node-feature-discovery icon indicating copy to clipboard operation
node-feature-discovery copied to clipboard

deployment: add startupProbe for nfd-master

Open marquiz opened this issue 1 year ago • 6 comments

This patch mitigates inadvertent termination of nfd-master pods by the liveness probe on big clusters.

With a recent change nfd-master started to wait (block) for informer caches to sync before starting the main loop. Consequently, this change also made the gRPC health enpoint to not respond until the caches have been synced. In big clusters the syncing the NodeFeature object cache takes a long time as the objects are big and there's (at least) one per each node in the cluster. Thus, in big clusters, the liveness probe kicks in and kills the nfd-master pod before it's ready.

marquiz avatar Jul 25 '24 13:07 marquiz

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: marquiz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Jul 25 '24 13:07 k8s-ci-robot

Deploy Preview for kubernetes-sigs-nfd ready!

Name Link
Latest commit fb6484fb8dcf7ba4ccb6d0cc552b4c43781e1fc5
Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-nfd/deploys/675b27cefdb1340008bef43d
Deploy Preview https://deploy-preview-1810--kubernetes-sigs-nfd.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

netlify[bot] avatar Jul 25 '24 13:07 netlify[bot]

/assign @ArangoGutierrez

/cc @ahmetb @lxlxok

marquiz avatar Jul 25 '24 13:07 marquiz

@marquiz: GitHub didn't allow me to request PR reviews from the following users: lxlxok.

Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/assign @ArangoGutierrez

/cc @ahmetb @lxlxok

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Jul 25 '24 13:07 k8s-ci-robot

ping @ArangoGutierrez

marquiz avatar Aug 16 '24 09:08 marquiz

/milestone v0.17 @ArangoGutierrez @TessaIO PTAL

marquiz avatar Dec 12 '24 18:12 marquiz

/lgtm

TessaIO avatar Dec 12 '24 18:12 TessaIO

LGTM label has been added.

Git tree hash: 501e7ceaa07d3088b868d609a260bf1abd5b568a

k8s-ci-robot avatar Dec 12 '24 18:12 k8s-ci-robot