istio
istio copied to clipboard
Untaint controller
This controller runs from istiod and removes not-ready taints from nodes that have CNI pods ready.
Part of https://github.com/istio/istio/issues/48286. The idea is that the user configures the infra provider to add a taint on new nodes, and the untaint controller removes the taints from the nodes as soon as it observes a ready cni pod on the node.
This is a draft as I still have some questions about the code, and wanted some feedback:
- To enable the taint controller, I followed the ambient controller example and used as environment variable. But, there's other configuration options for the taint controller, that I currently pass as pilot command line args. I'm wondering if there's a better way to have all config in one place? (I suspect we'll need to add more options in the future - customizing the name and effect of the taint)
- The CNI might be deployed to a different namespace. I added a helm value in the pilot chart for that. I would be nice if we could re-use the value for
components.cni.namespaceso there's no redundancy, but i'm not clear how does the operator spec gets translated into helm charts/values? - The cni daemonset had a revision label, but the pod template didn't have. Is that intentional? - I added revision label for the pods so for consistency (as i need to watch said pods).
- I wrote an integration test in its own package as I need to install istio with the taint controller enabled. Would be good to know if the way I did it makes sense.
Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all
😊 Welcome @yuval-k! This is either your first contribution to the Istio istio repo, or it's been a while since you've been here.
You can learn more about the Istio working groups, Code of Conduct, and contribution guidelines by referring to Contributing to Istio.
Thanks for contributing!
Courtesy of your friendly welcome wagon.
The cni daemonset had a revision label, but the pod template didn't have. Is that intentional? - I added revision label for the pods so for consistency (as i need to watch said pods).
No, it's not, revisions have no meaning applied to the CNI DS directly ATM.
Would we ever want to potentially re-taint the node if CNI goes unready, or is this by-design a one-way operation?
Strictly speaking it's not required since if the CNI was down the scheduling of new pods would die anyway (with the inpod model), but it might be cleaner/nicer from a user perspective to lean on k8s scheduling instead, if we're already doing all this anyway.
Would we ever want to potentially re-taint the node if CNI goes unready, or is this by-design a one-way operation?
Strictly speaking it's not required since if the CNI was down the scheduling of new pods would die anyway (with the inpod model), but it might be cleaner/nicer from a user perspective to lean on k8s scheduling instead, if we're already doing all this anyway.
I believe the unavoidable race is a one time thing, as the user doesn't control how a node is added to the cluster.
If the CNI becomes not ready due to a crash of the cni agent pod for example - the cni binary and config are still present, so it's not a problem for this controller to solve.
There might be a role to play here as part of CNI upgrade (as CNI removes its binaries when the CNI pod is stopped), but i think we need to have an holistic upgrade discussion with all the moving parts (including ztunnel), so I consider it out of scope for this PR.
@hzxuzhonghu or @howardjohn can you review at high level to provide some early feedback to Yuval? this is identified as a key blocker for ambient beta.
Approach looks reasonable to me.
Have q on how this works with sidecars given istio-cni is also used in sidecars to eliminate certain network privileges on the app pod deployer.
Also, how this works with sidecar and ambient co-exist where user may or may not use istio-cni for sidecar?
This should work find for sidecars as well, it's just not needed for sidecars today as the init container solves that problem in sidecars
ok I think this is ready for another round. cc @howardjohn @linsun @bleggett
Two cents
-
I donot think taint is a good solution, in a k8s cluster there maybe many system components like istio installed. We cannot make them all depend on istio.
-
There is a chance that cni down and restarted, during this duration, applications still have chance to be not injected.
Two cents
1. I donot think taint is a good solution, in a k8s cluster there maybe many system components like istio installed. We cannot make them all depend on istio. 2. There is a chance that cni down and restarted, during this duration, applications still have chance to be not injected.
-
We plan to have 2 solutions, this taint controller, and an init container. The init container requires a mutating web-hook which is not idle for some users - so we want to have this as alternative option.
-
Yes, but these events are (usually) controlled by the user. The taint controller is here to prevent races when a new node is added to the cluster, something that the user doesn't control.
The init container requires a mutating web-hook which is not idle for some users
Can you elaborate on it
The init container requires a mutating web-hook which is not idle for some users
Can you elaborate on it
This is the init container approach: https://github.com/istio/istio/pull/49092 See that the it requires the mutating webhook to inject an init container
This should be ready for another round of reviews.
One question i have is that currently we need to configure this in 2 places:
pilot:
taint:
enabled: true
namespace: "kube-system"
env:
PILOT_ENABLE_NODE_UNTAINT_CONTROLLERS: "true"
One in the pilot values, and once as an env var to enabled the controller. I was hoping to just do the pilot values. but that would mean that enabling the controller would require a command line arg to pilot, which doesn't seem to be the common way to enable optional features. Was hoping to get some direction of what's preferable conventions-wise.
cc @bleggett @hzxuzhonghu @howardjohn
@yuval-k we could have PILOT_NODE_UNTAINT_CONTROLLERS_NAMESPACE=.... Unset=disabled, set sets the namespace. Then its one config value. In the rbac we can check the env var too probably?
I would prefer to avoid CLI flag since we have moved away from that for various reasons, but other than that I am not super opinionated on the rest
not stale - still need this after @yuval-k has time to rebase
@jacob-delgado, any chance you'd be able to review these values.yaml changes?
Hi @ilrudie i reviewed the values.yaml quickly, given that is our API, I'd recommend some clarification for the newly added fields so users can easily understand how to use this feature.