istio icon indicating copy to clipboard operation
istio copied to clipboard

Untaint controller

Open yuval-k opened this issue 1 year ago • 13 comments

This controller runs from istiod and removes not-ready taints from nodes that have CNI pods ready.

Part of https://github.com/istio/istio/issues/48286. The idea is that the user configures the infra provider to add a taint on new nodes, and the untaint controller removes the taints from the nodes as soon as it observes a ready cni pod on the node.

This is a draft as I still have some questions about the code, and wanted some feedback:

  • To enable the taint controller, I followed the ambient controller example and used as environment variable. But, there's other configuration options for the taint controller, that I currently pass as pilot command line args. I'm wondering if there's a better way to have all config in one place? (I suspect we'll need to add more options in the future - customizing the name and effect of the taint)
  • The CNI might be deployed to a different namespace. I added a helm value in the pilot chart for that. I would be nice if we could re-use the value for components.cni.namespace so there's no redundancy, but i'm not clear how does the operator spec gets translated into helm charts/values?
  • The cni daemonset had a revision label, but the pod template didn't have. Is that intentional? - I added revision label for the pods so for consistency (as i need to watch said pods).
  • I wrote an integration test in its own package as I need to install istio with the taint controller enabled. Would be good to know if the way I did it makes sense.

yuval-k avatar Jan 16 '24 16:01 yuval-k

Skipping CI for Draft Pull Request. If you want CI signal for your change, please convert it to an actual PR. You can still manually trigger a test run with /test all

istio-testing avatar Jan 16 '24 16:01 istio-testing

😊 Welcome @yuval-k! This is either your first contribution to the Istio istio repo, or it's been a while since you've been here.

You can learn more about the Istio working groups, Code of Conduct, and contribution guidelines by referring to Contributing to Istio.

Thanks for contributing!

Courtesy of your friendly welcome wagon.

istio-policy-bot avatar Jan 16 '24 16:01 istio-policy-bot

The cni daemonset had a revision label, but the pod template didn't have. Is that intentional? - I added revision label for the pods so for consistency (as i need to watch said pods).

No, it's not, revisions have no meaning applied to the CNI DS directly ATM.

bleggett avatar Jan 16 '24 16:01 bleggett

Would we ever want to potentially re-taint the node if CNI goes unready, or is this by-design a one-way operation?

Strictly speaking it's not required since if the CNI was down the scheduling of new pods would die anyway (with the inpod model), but it might be cleaner/nicer from a user perspective to lean on k8s scheduling instead, if we're already doing all this anyway.

bleggett avatar Jan 16 '24 17:01 bleggett

Would we ever want to potentially re-taint the node if CNI goes unready, or is this by-design a one-way operation?

Strictly speaking it's not required since if the CNI was down the scheduling of new pods would die anyway (with the inpod model), but it might be cleaner/nicer from a user perspective to lean on k8s scheduling instead, if we're already doing all this anyway.

I believe the unavoidable race is a one time thing, as the user doesn't control how a node is added to the cluster.

If the CNI becomes not ready due to a crash of the cni agent pod for example - the cni binary and config are still present, so it's not a problem for this controller to solve.

There might be a role to play here as part of CNI upgrade (as CNI removes its binaries when the CNI pod is stopped), but i think we need to have an holistic upgrade discussion with all the moving parts (including ztunnel), so I consider it out of scope for this PR.

yuval-k avatar Jan 16 '24 20:01 yuval-k

@hzxuzhonghu or @howardjohn can you review at high level to provide some early feedback to Yuval? this is identified as a key blocker for ambient beta.

linsun avatar Jan 22 '24 22:01 linsun

Approach looks reasonable to me.

Have q on how this works with sidecars given istio-cni is also used in sidecars to eliminate certain network privileges on the app pod deployer.

Also, how this works with sidecar and ambient co-exist where user may or may not use istio-cni for sidecar?

This should work find for sidecars as well, it's just not needed for sidecars today as the init container solves that problem in sidecars

yuval-k avatar Jan 29 '24 15:01 yuval-k

ok I think this is ready for another round. cc @howardjohn @linsun @bleggett

yuval-k avatar Jan 29 '24 21:01 yuval-k

Two cents

  1. I donot think taint is a good solution, in a k8s cluster there maybe many system components like istio installed. We cannot make them all depend on istio.

  2. There is a chance that cni down and restarted, during this duration, applications still have chance to be not injected.

hzxuzhonghu avatar Jan 30 '24 07:01 hzxuzhonghu

Two cents

1. I donot think taint is a good solution, in a k8s cluster there maybe many system components like istio installed. We cannot make them all depend on istio.

2. There is a chance that cni down and restarted, during this duration, applications still have chance to be not injected.
  1. We plan to have 2 solutions, this taint controller, and an init container. The init container requires a mutating web-hook which is not idle for some users - so we want to have this as alternative option.

  2. Yes, but these events are (usually) controlled by the user. The taint controller is here to prevent races when a new node is added to the cluster, something that the user doesn't control.

yuval-k avatar Jan 30 '24 12:01 yuval-k

The init container requires a mutating web-hook which is not idle for some users

Can you elaborate on it

hzxuzhonghu avatar Jan 31 '24 03:01 hzxuzhonghu

The init container requires a mutating web-hook which is not idle for some users

Can you elaborate on it

This is the init container approach: https://github.com/istio/istio/pull/49092 See that the it requires the mutating webhook to inject an init container

yuval-k avatar Jan 31 '24 11:01 yuval-k

This should be ready for another round of reviews.

One question i have is that currently we need to configure this in 2 places:

  pilot:
    taint:
      enabled: true
      namespace: "kube-system"
    env:
      PILOT_ENABLE_NODE_UNTAINT_CONTROLLERS: "true"

One in the pilot values, and once as an env var to enabled the controller. I was hoping to just do the pilot values. but that would mean that enabling the controller would require a command line arg to pilot, which doesn't seem to be the common way to enable optional features. Was hoping to get some direction of what's preferable conventions-wise.

cc @bleggett @hzxuzhonghu @howardjohn

yuval-k avatar Feb 26 '24 18:02 yuval-k

@yuval-k we could have PILOT_NODE_UNTAINT_CONTROLLERS_NAMESPACE=.... Unset=disabled, set sets the namespace. Then its one config value. In the rbac we can check the env var too probably?

I would prefer to avoid CLI flag since we have moved away from that for various reasons, but other than that I am not super opinionated on the rest

howardjohn avatar Feb 27 '24 22:02 howardjohn

not stale - still need this after @yuval-k has time to rebase

linsun avatar Mar 12 '24 17:03 linsun

@jacob-delgado, any chance you'd be able to review these values.yaml changes?

ilrudie avatar Apr 12 '24 14:04 ilrudie

Hi @ilrudie i reviewed the values.yaml quickly, given that is our API, I'd recommend some clarification for the newly added fields so users can easily understand how to use this feature.

linsun avatar Apr 12 '24 14:04 linsun