community virt-handler: Restrict write access to nodes

This PR is addressing a security issue [0][1] that exists because virt-handler is a daemonset that can patch the node. This proposal is presenting possible proposals to reduce virt-handler's node patching abilities to mitigate this issue.

[0] https://github.com/kubevirt/kubevirt/security/advisories/GHSA-cp96-jpmq-xrr2 [1] https://github.com/kubevirt/kubevirt/issues/9109

Oct 10 '23 16:10 xpivarc

Skipping CI for Draft Pull Request. If you want CI signal for your change, please convert it to an actual PR. You can still manually trigger a test run with /test all

Oct 10 '23 16:10 kubevirt-bot

Hey @xpivarc, I have a question. If we create a shadow node that the virt-handler writes to, then what prevents a bad person that took control over virt-handler to write to shadow nodes of other Nodes? Couldn't this bad person still take control of the cluster by manipulating it?

To a degree yes, but it can only influence kubevirt.io/ scheduling constraints which solves the most important problem, that one can lure in CP workloads. VMs can still be tricked. If (and I am not sure if we get that info on an AdmissionReview) we also get the IP address in userinfo when eventuallly switching to cert based auth (which would containe the client IP like kubelet certs), we can also enforce by webhook that it can only write to its own node in a next step. That would now be possible because the webhook would only block our shadow nodes and not a core k8s entity.

Jan 30 '24 16:01 rmohr

@RamLavi 100% what Roman mentioned. We have control over what we propagate and we can also solve the problem of impersonation if we want because the scalability will not be an issue anymore.

Feb 04 '24 08:02 xpivarc

/hold

I think there is no clear agreement if this approach is the right one.

I'm putting a hold on it to signal that this requires more discussion and an agreed design.

Important as @RamLavi is working on an implementation.

Feb 28 '24 16:02 fabiand

Change: Addressing reviewers' comments. @alaypatel07 @rmohr @xpivarc @fabiand PTAL

Mar 03 '24 20:03 RamLavi

@RamLavi will oyu be updating this spec according to Roman's comment about a a real spec?

Mar 12 '24 09:03 fabiand

@RamLavi will oyu be updating this spec according to Roman's comment about a a real spec?

Hey, I wasn't online this past week (was sick, now better), so I'm not sure what spec update is needed. Will take a closer look later today

Mar 12 '24 13:03 RamLavi

Note: I am considering a pivot from the suggested solution. I will reorder and present the possible solutions so that we can discuss the best solution and continue from there.

Mar 21 '24 19:03 RamLavi

Note: I am considering a pivot from the suggested solution. I will reorder and present the possible solutions so that we can discuss the best solution and continue from there.

Another solution has been raised, regarding going back to the original need to use the node to communicate information. E.g. the need for the heartbeat and the way HW/SW details are exposed to the node. Are you going to consider these as well in the summary/comparison?

Mar 25 '24 05:03 EdDev

Note: I am considering a pivot from the suggested solution. I will reorder and present the possible solutions so that we can discuss the best solution and continue from there.

Another solution has been raised, regarding going back to the original need to use the node to communicate information. E.g. the need for the heartbeat and the way HW/SW details are exposed to the node. Are you going to consider these as well in the summary/comparison?

If you are referring to the device-mgr/NFD solution then yes

Mar 25 '24 06:03 RamLavi

Updated the proposal to exhibit all the options discussed. This is done in order to allow open discussion.

Mar 27 '24 08:03 RamLavi

Updated the proposal to exhibit all the options discussed. This is done in order to allow open discussion.

I was expecting that by now there would be more discussion on this topic. Doing this right is important, and would make it easier for our future selves to maintain.

Apr 01 '24 06:04 RamLavi

Added a new option suggested by @xpivarc (thanks!)

Apr 15 '24 14:04 RamLavi

Change: Addressed @xpivarc 's comments.

@xpivarc can we change the PR title and maybe add a PR desc?

The PR title should be: virt-handler: Restrict write access to nodes The PR desc should be:

This PR is addressing a security issue [0][1] that exists because virt-handler is a daemonset that can patch the node. This proposal is presenting possible proposals to reduce virt-handler's node patching abilities to mitigate this issue.

[0] https://github.com/kubevirt/kubevirt/security/advisories/GHSA-cp96-jpmq-xrr2
[1] https://github.com/kubevirt/kubevirt/issues/9109

Apr 16 '24 12:04 RamLavi

Thanks @RamLavi for the exhausting analysis. I believe we exhausted all of our options. The order of the options matches my preference but here the option 1 needs little bit of clarification. The policies should be GA but the additional info that we need for the validation to work is feature gated itself. It is not clear to me what feature gate it is and what is the state. Nevertheless it would be an option per cluster configuration.

@rmohr @alaypatel07 @EdDev PTAL

While I have a non-blocking preference to the original proposal (new CRD) from an architectural perspective where I would want to go, option 1 seems to be the simple low-hanging fruit which we should take. +1

Here why I would still want to explore option2 in this and other contexts a little bit more later:

solves the heartbeat update as well
allows exchanging capabilities more structured (like node policies could be later on applied, matching shadow nodes, giving virt-handler information)
the webhook protection can still be applied to the "shadow node" (that was part 2 of the improvement, where the CRD instruction would be part 1)
if a CP node with virt-controller (which can continue to modify node objects either way) is taken over, people have so many privileges that it does not matter if virt-controller can change the node

On the plus side for the pure restriction option: letting virt-handler just adding the labels is simple, which I really like.

They are also not mutually exclusive, like we could still add a new CRD for specific use-cases while just restricting for now, if from a feature GA perspective things look promising.

So from that perspective I am ok with option 1, would just love to see some follow-up thoughts on the points above.

Apr 17 '24 15:04 rmohr

Thanks @RamLavi for the exhausting analysis. I believe we exhausted all of our options. The order of the options matches my preference but here the option 1 needs little bit of clarification. The policies should be GA but the additional info that we need for the validation to work is feature gated itself. It is not clear to me what feature gate it is and what is the state. Nevertheless it would be an option per cluster configuration. @rmohr @alaypatel07 @EdDev PTAL

While I have a non-blocking preference to the original proposal (new CRD) from an architectural perspective where I would want to go, option 1 seems to be the simple low-hanging fruit which we should take. +1

+1 I updated the proposal to be more clear and also talk about other considerations like backport-comparibility. PTAL.

Here why I would still want to explore option2 in this and other contexts a little bit more later:

solves the heartbeat update as well

allows exchanging capabilities more structured (like node policies could be later on applied, matching shadow nodes, giving virt-handler information)

the webhook protection can still be applied to the "shadow node" (that was part 2 of the improvement, where the CRD instruction would be part 1)

if a CP node with virt-controller (which can continue to modify node objects either way) is taken over, people have so many privileges that it does not matter if virt-controller can change the node

On the plus side for the pure restriction option: letting virt-handler just adding the labels is simple, which I really like.

They are also not mutually exclusive, like we could still add a new CRD for specific use-cases while just restricting for now, if from a feature GA perspective things look promising.

So from that perspective I am ok with option 1, would just love to see some follow-up thoughts on the points above.

I agree that we should continue to explore the need to introduce such a CRD to better define communication between virt-handler to other components. However as of now I think that we don't have enough reason to add one. Let me elaborate:

heartbeat can be replaced by other logic that would get the same results, such as liveness and readiness probes, backed by a controller that would watch the virt-handler Daemonset status. I plan to check this and set a proposal in the future so that we could further debate it there.
we still don't have concrete plans for node policies. When we have such then I would be all forwards considering this all over again.
I'm not sure we need the webhook if we use this option. Can you elaborate?

Apr 18 '24 17:04 RamLavi

Change:

updated the chosen the solution: validationAdmissionPolicy, following the positive feedback and POC done offline.
updated the backwards compatibility approach, and added an appendix to further discuss it.

Note: This proposal is now mature enough and is waiting for approval.

May 08 '24 08:05 RamLavi

Thanks for the update Ram.

@RamLavi please resolve all open conversations in this document in order to move forward.

May 08 '24 11:05 fabiand

Thanks for the update Ram.

@RamLavi please resolve all open conversations in this document in order to move forward.

Technically speaking, I don't think I can resolve conversations I didn't open, but I made sure I answered them.

May 08 '24 11:05 RamLavi

Thanks a lot @RamLavi for this proposal! As said above, I vote +1 to the Validating Admission Policy approach.

/lgtm

May 09 '24 14:05 iholder101

In general, the ShadowNode approach with the variation that @Barakmor1 suggested, would be my preference. My main fear is that Validating Admission Policy for virt-handler will make a debugging, when something goes work, challenging (to say the least) Moreover, it seems like a missed opportunity that we are not taking care of all the labels on the node virt-handler is setting - that is not isolating it on a separate object.

However, I'm not a blocking this - especially if @iholder101 and @jean-edouard are happy to support this. so /lgtm from me.

May 09 '24 14:05 vladikr

Hey @vladikr! Before merging this, let's make sure we discuss all the points you've raised.

In general, the ShadowNode approach with the variation that @Barakmor1 suggested, would be my preference.

@Barakmor1 and I were discussing this approach offline. One of the problems with this approach is that pods are (mostly) immutable, and we can't change affinities nor .spec.nodeName after the pod is created.

The steps @Barakmor1 was mentioning above are:

virt-handler is deployed and sets the ShadowNode Objects with the proper annotations and labels, just like you mentioned.

Whenever a vmi is created, virt-api will mutate the virt-launcher pod with a dedicated SchedulingGate.

virt-controller will detect virt-launcher pods with the SchedulingGate and do the following for each launcher pod:

Calculate suitable nodes for scheduling from the kubevirt perspective, let's mark it as <nodeList>.

Remove the SchedulingGate from the launcher pod and add node affinity to allow scheduling only to nodes in <nodeList>.

Unfortunately, the last bullet under (3) is not possible, since after a pod is created, we simply cannot set new affinities.

As you said rightfully, we can skip the step of adding a scheduling gate. But even then we're facing corner cases, such as:

A virt-launcher pod is created, targeting a single node N1 that fits.
Just before the pod is scheduled, N1 is being removed from the cluster.
Right after that, a new node N2 (that is fitting for the launcher pod) joins the cluster.
The pod remains scheduling forever.

In this (and similar) cases we would have to delete the pod and re-create it.

Bottom line is: to support such an approach we'd have to introduce a mechanism that monitors the scheduling pods and re-creates them when necessary. Let's continue discussing this approach further if you think it's viable/valuable.

My main fear is that Validating Admission Policy for virt-handler will make a debugging, when something goes work, challenging (to say the least)

That's a fair point. @xpivarc Any thoughts?

Moreover, it seems like a missed opportunity that we are not taking care of all the labels on the node virt-handler is setting - that is not isolating it on a separate object.

I 100% agree. This would definitely be ideal. The sad truth is that the vast vast majority of our ~200 labels are completely static in nature, and it's very tempting to save them elsewhere (either the shadow node CRD or even a configmap).

What concerns me around this is that we essentially have two options AFAICT:

label the node, create virt-launcher (i.e. what we're doing now), let the scheduler handle scheduling.
as said above, create a mechanism to monitor and re-create the launcher pod whenever necessary. Essentially, we get into the business of scheduling.

@vladikr what are your thoughts regarding that? Again if you think it's a reasonable path forward, let's discuss it!

May 09 '24 15:05 iholder101

Hey @vladikr! Before merging this, let's make sure we discuss all the points you've raised.

In general, the ShadowNode approach with the variation that @Barakmor1 suggested, would be my preference.

@Barakmor1 and I were discussing this approach offline. One of the problems with this approach is that pods are (mostly) immutable, and we can't change affinities nor .spec.nodeName after the pod is created.

The steps @Barakmor1 was mentioning above are:

virt-handler is deployed and sets the ShadowNode Objects with the proper annotations and labels, just like you mentioned.

Whenever a vmi is created, virt-api will mutate the virt-launcher pod with a dedicated SchedulingGate.

virt-controller will detect virt-launcher pods with the SchedulingGate and do the following for each launcher pod:

Calculate suitable nodes for scheduling from the kubevirt perspective, let's mark it as .

Remove the SchedulingGate from the launcher pod and add node affinity to allow scheduling only to nodes in .

Unfortunately, the last bullet under (3) is not possible, since after a pod is created, we simply cannot set new affinities.

As you said rightfully, we can skip the step of adding a scheduling gate. But even then we're facing corner cases, such as:

A virt-launcher pod is created, targeting a single node N1 that fits.

Just before the pod is scheduled, N1 is being removed from the cluster.

Right after that, a new node N2 (that is fitting for the launcher pod) joins the cluster.

The pod remains scheduling forever.

In this (and similar) cases we would have to delete the pod and re-create it.

Bottom line is: to support such an approach we'd have to introduce a mechanism that monitors the scheduling pods and re-creates them when necessary. Let's continue discussing this approach further if you think it's viable/valuable.

My main fear is that Validating Admission Policy for virt-handler will make a debugging, when something goes work, challenging (to say the least)

That's a fair point. @xpivarc Any thoughts?

Moreover, it seems like a missed opportunity that we are not taking care of all the labels on the node virt-handler is setting - that is not isolating it on a separate object.

I 100% agree. This would definitely be ideal. The sad truth is that the vast vast majority of our ~200 labels are completely static in nature, and it's very tempting to save them elsewhere (either the shadow node CRD or even a configmap).

What concerns me around this is that we essentially have two options AFAICT:

label the node, create virt-launcher (i.e. what we're doing now), let the scheduler handle scheduling.

as said above, create a mechanism to monitor and re-create the launcher pod whenever necessary. Essentially, we get into the business of scheduling.

@vladikr what are your thoughts regarding that? Again if you think it's a reasonable path forward, let's discuss it!

I want to stress again that I am all forward to any change that will reduce patched to the node. IMO virt-handler should not patch the node at all. But the fact is it does patch it (even if we remove node-labeller), and in order to solve the CVE, the most direct way to solve it is to add a VAP. Can we agree to:

Add the VAP as part of this proposal, solving the CVE.
In parallel, propose ways to remove all the different scenarios where virt-handler patches the node (there is more than just node-labeller). For example, I plan to POC the removal of the heartbeat mechanism.. Other suggestions to remove other patches were already suggested on this PR (see solution 3), more suggestion are always welcome.
Once all patches are gone, we can say goodbye to this VAP, and also the RBAC to patch the node.

May 09 '24 17:05 RamLavi

/approve

May 09 '24 17:05 vladikr

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vladikr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [vladikr]

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

May 09 '24 17:05 kubevirt-bot

/hold cancel @fabiand Seems we have agreement, I hope I did not miss anyone. Thanks everyone for the discussion, hopefully we can do more than fix CVEs in the future ;)

May 10 '24 15:05 xpivarc

Awesome

10x

:rocket:

@xpivarc we still need to follow up to limit a handlers powers to the workloads on it's own node. Riht? Can VAP help here as well?

May 23 '24 13:05 fabiand

Awesome

10x

🚀

@xpivarc we still need to follow up to limit a handlers powers to the workloads on it's own node. Riht? Can VAP help here as well?

Yes the VAP could be used here but it is not strict requirement as we still have webhooks for the resources. Therefore it is matter of extending them to check the right handler is making the request.

May 27 '24 08:05 xpivarc

community community copied to clipboard

virt-handler: Restrict write access to nodes

community
community copied to clipboard