vcluster icon indicating copy to clipboard operation
vcluster copied to clipboard

[bug] Node tolerations are propagated to underlying k8s system

Open mtougeron opened this issue 4 years ago • 6 comments

When creating a deployment in vCluster I would expect the tolerations to apply to the 'fake' kubernetes nodes not the ones from the underlying k8s infrastructure. For example, we have an OPA policy that prohibits users from deploying workloads that tolerate running on the control plane nodes outside of the "system" namespaces. When a user tries to launch a vCluster it launches a coredns deployment that has the toleration of node-role.kubernetes.io/master. This would allow the pod to be launched on the real kubernetes master node(s) not just the 'fake' one.

mtougeron avatar Dec 13 '21 22:12 mtougeron

@mtougeron thanks for creating this issue! This will be solved with the newer v0.5.0 version which is already available to try out at v0.5.0-alpha.7, that should not have this toleration anymore.

FabianKramm avatar Dec 14 '21 08:12 FabianKramm

Ah great. I'll try and test that out today!

mtougeron avatar Dec 14 '21 16:12 mtougeron

I tested this out with 0.5.0-alpha.7 and the tolerations are still propagated to the underlying kubernetes system. I see where the coredns can now be modified in this repo to not have the tolerations (lmk if you want me to change that).

I think what I was expecting was that the tolerations would be rewritten like the pod affinity label selectors are so that it tolerates the 'fake' nodes not the real nodes.

mtougeron avatar Dec 14 '21 17:12 mtougeron

@mtougeron ah, I see, we have a feature called node-selector for this where you can scope the vcluster workloads to certain nodes (see https://www.vcluster.com/docs/architecture/nodes), which is similar to what you want. Regarding adding tolerations to pods to only tolerate the fake nodes, this might be a little bit of a chicken egg problem as fake nodes are created as soon as a pod is bound to a node. vcluster also has no information about the node since it does not have appropriate RBAC permissions to access node information, so vcluster will just make a node up, when fake nodes are enabled. When using node selector, vcluster will instead sync all nodes that match the selector and also sync the correct node labels, taints etc., which might then be better suited for your use case.

FabianKramm avatar Dec 14 '21 19:12 FabianKramm

this might be a little bit of a chicken egg problem

I see what you mean by this. It might just need the ability to remove the tolerations from the manifests/coredns/coredns.yaml so the vCluster fully bootstraps and then we'll handle the rest in our ci pipeline's manipulation of the manifests.

which might then be better suited for your use case.

fwiw, the goal I'm working on is to be able to test manifest changes for our production clusters inside a vCluster. This would allow non-privileged users to contribute to the code base without having to build their own k8s clusters to test manifest changes. I've been attempting to do this without having to change anything in our manifests and so far the only problems I've run into are the PSPs (which I don't expect vCluster to solve), tolerations and hostPort mappings. :)

mtougeron avatar Dec 14 '21 19:12 mtougeron

@mtougeron that makes sense, we could also think about removing those tolerations by default in the coredns deployment as vcluster should use as less permissions as possible by default. You are also able to adjust coredns configuration by mounting a different set of yamls into the vcluster pod overriding those manifests.

Thanks for explaining your use case! Besides changing the coredns yamls, I guess PSP, tolerations and hostPort mappings could be also solved by a mutating webhook in the host namespace, a good example for that is kyverno, gatekeeper or our own policy engine jsPolicy.

FabianKramm avatar Dec 14 '21 20:12 FabianKramm

The tolerations on the CoreDNS pods were removed. I am also adding a note to the docs to warn users that tolerations are synced with the pods. So I think this can be resolved once the docs update is merged.

matskiv avatar Nov 07 '22 09:11 matskiv