node-problem-detector
node-problem-detector copied to clipboard
tainting and untainting logic implemented via configuration
Which issue(s) this PR fixes:
this pr solve issue #457 .
What this PR does / why we need it:
This PR adds functionality of tainting and untainting a node on specific circumstances conditionally. With that improvement, node-problem-detector can be used in conjunction with descheduler. User should specify taintEnabled, taintKey, taintValue, taintEffect in config/kernel-monitor.json. If not specified, taintEnabled is false, so npd will not taint any node. With that improvement, node-problem-detector also removes taint if problem is resolved.
Special notes for your reviewer:
This improvement needs update Clusterrole to node-problem-detector. If that PR somehow merged to the master, update verb must be added to right here.
Hi @bilalcaliskan. Thanks for your PR.
I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
i have modified the commit message so there are lots of activity since PR is opened, sorry for that.
/retest
@bilalcaliskan: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.
In response to this:
/retest
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/retest
@bilalcaliskan: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.
In response to this:
/retest
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/assign @xueweiz @andyxning
We discussed this long time back when we firstly started NPD. The taint can be applied by NPD itself, and can also be applied by another controller that takes NPD generated conditions and events as signal. Having another controller is the pattern most people use right now, because the cluster level controller should have a global view and a better understanding of whether it is OK to taint a node or not. For example, if 5 out of 10 nodes have an issue, you may not want NPD on each node to taint the node, that may make a lot of pods nowhere to run. Instead, a controller could drain the bad nodes one by one and recreate/repair them.
According to the remedy system section, NPD is able to add condition to node that eventually will taint them, then descheduler will evict pods that doesn't respect the taint.
Or we could be taking advantage of Taint Based Eviction instead of descheduler. https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/#taint-based-evictions
Currently, descheduler support RemovePodsViolatingNodeTaints and NPD can add condition nevertheless, there is no way to rely only those two compontents to drain nodes automatically. Draino is still required, am I right ?
Condition is mostly for informative.
Taint will actually affect the scheduler decision, e.g. not schedule any pod to a node any more, or evict running pods from a node. That kind of decision should be done by the cluster level controller, or else if in the extreme case half nodes decide to taint and evict pods, the cluster may not have enough resource to run those pods.
Condition is mostly for informative.
Taint will actually affect the scheduler decision, e.g. not schedule any pod to a node any more, or evict running pods from a node. That kind of decision should be done by the cluster level controller, or else if in the extreme case half nodes decide to taint and evict pods, the cluster may not have enough resource to run those pods.
I agree 👍 I'm just saying that Remedy Systems section is misleading because it says that NPD + Descheduler can do the job together.
I want to avoid using draino so, I wonder how to taint by custom condition 🤔
NPD can works with the descheduler only with the predefined (Ready, MemoryPressure...) conditions. So, the current design is in my opinion definitively limited. I understand the decision of spliting responsability but the documentation leads to mesleading.
I agree with @azman0101, descheduler does not do tainting by itself. According to remedy systems section we can use descheduler for that purpose but sadly i guess there is just one option for that and its draino.
@Random-Liu as i know, Descheduler does the job only if there are specified taints on the node. So we should use 3 different component to taint nodes on specific NodeCondition(npd, draino and Descheduler in that scenario)? I guess its too much efford especially if you have over 10 k8s clusters on production.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
@andyxning can you please review this PR?
Regards.
@Random-Liu @vteratipally @andyxning @xueweiz hello folks!
Is there any chance for this PR to be merged to master? It's been waiting for a long time but if there is no chance it to be merged, i think it is time to close it.
My personal opinion; I love node-problem-detector much and i hope eventually this feature will be implemented even not with my PR. It will provide much more control to the end users and decrease the complexity of combining multiple tools.
/ok-to-test
@vteratipally i am seeing logs like below on the failing e2e tests, do you have an idea about these? I think these logs are not related with the feature added with this PR. Is there any additional thing before running e2e tests? It looks like an environmental problem but am i missing something can not be sure :(
BeforeSuite on Node 1 failed. "transport: Error while dialing dial unix /var/run/dockershim.sock: connect: no such file or directory". Reconnecting...
Ginkgo timed out waiting for all parallel procs to report back
Jan 14 20:02:05.626: Timed out after 60.000s.
STEP: Check node-problem-detector posted KernelDeadlock condition on node "e2e-ffa7072c00-429e8-minion-group-1933" Jan 14 20:02:06.200: FAIL: Timed out after 60.000s. Expected success, but got an error: <*errors.errorString | 0xc000856330>: { s: "node condition \"KernelDeadlock\" not found", } node condition "KernelDeadlock" not found
/retest-required
This is actually a known issue, tests have been failing as there were some ongoing changes in the kubernetes repo related to dockershim removal. I am working on it
/retest-required
by the way, this PR looks like waiting for another PR on the kubernetes/test-infra, just writing down as a note. And here is the related issue.
since https://github.com/kubernetes/test-infra/pull/24914 is merged, please /retest
hello again, first of all, @vteratipally thank you for your help!
As a reviewer, @andyxning @xueweiz WDYT about that change? as i see from that PR history, lots of people will be very happy if this is merged :)
by the way merging my own PR is not important than the feature itself. I also will be glad if anyone else develop that feature with better implementation.
/retest
Any updates?
Any Updates?
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
I've played with the taint feature locally for a while. The test was made with some custom plugins from uswitch repo.
Under some script scenarios, a Node Condition can be set to Unknown (e.g. script returning status code>1) and the taint is not applied.
This may be due to the code only handling conditions of True and False.
Is it intended behavior? If so, could it be documented or adapted to match Unknown scenarios?
Awesome contribution @bilalcaliskan :+1: