community icon indicating copy to clipboard operation
community copied to clipboard

virt-handler: Restrict write access to nodes

Open xpivarc opened this issue 2 years ago • 17 comments

This PR is addressing a security issue [0][1] that exists because virt-handler is a daemonset that can patch the node. This proposal is presenting possible proposals to reduce virt-handler's node patching abilities to mitigate this issue.

[0] https://github.com/kubevirt/kubevirt/security/advisories/GHSA-cp96-jpmq-xrr2 [1] https://github.com/kubevirt/kubevirt/issues/9109

xpivarc avatar Oct 10 '23 16:10 xpivarc

Skipping CI for Draft Pull Request. If you want CI signal for your change, please convert it to an actual PR. You can still manually trigger a test run with /test all

kubevirt-bot avatar Oct 10 '23 16:10 kubevirt-bot

Hey @xpivarc, I have a question. If we create a shadow node that the virt-handler writes to, then what prevents a bad person that took control over virt-handler to write to shadow nodes of other Nodes? Couldn't this bad person still take control of the cluster by manipulating it?

To a degree yes, but it can only influence kubevirt.io/ scheduling constraints which solves the most important problem, that one can lure in CP workloads. VMs can still be tricked. If (and I am not sure if we get that info on an AdmissionReview) we also get the IP address in userinfo when eventuallly switching to cert based auth (which would containe the client IP like kubelet certs), we can also enforce by webhook that it can only write to its own node in a next step. That would now be possible because the webhook would only block our shadow nodes and not a core k8s entity.

rmohr avatar Jan 30 '24 16:01 rmohr

@RamLavi 100% what Roman mentioned. We have control over what we propagate and we can also solve the problem of impersonation if we want because the scalability will not be an issue anymore.

xpivarc avatar Feb 04 '24 08:02 xpivarc

/hold

I think there is no clear agreement if this approach is the right one.

I'm putting a hold on it to signal that this requires more discussion and an agreed design.

Important as @RamLavi is working on an implementation.

fabiand avatar Feb 28 '24 16:02 fabiand

Change: Addressing reviewers' comments. @alaypatel07 @rmohr @xpivarc @fabiand PTAL

RamLavi avatar Mar 03 '24 20:03 RamLavi

@RamLavi will oyu be updating this spec according to Roman's comment about a a real spec?

fabiand avatar Mar 12 '24 09:03 fabiand

@RamLavi will oyu be updating this spec according to Roman's comment about a a real spec?

Hey, I wasn't online this past week (was sick, now better), so I'm not sure what spec update is needed. Will take a closer look later today

RamLavi avatar Mar 12 '24 13:03 RamLavi

Note: I am considering a pivot from the suggested solution. I will reorder and present the possible solutions so that we can discuss the best solution and continue from there.

RamLavi avatar Mar 21 '24 19:03 RamLavi

Note: I am considering a pivot from the suggested solution. I will reorder and present the possible solutions so that we can discuss the best solution and continue from there.

Another solution has been raised, regarding going back to the original need to use the node to communicate information. E.g. the need for the heartbeat and the way HW/SW details are exposed to the node. Are you going to consider these as well in the summary/comparison?

EdDev avatar Mar 25 '24 05:03 EdDev

Note: I am considering a pivot from the suggested solution. I will reorder and present the possible solutions so that we can discuss the best solution and continue from there.

Another solution has been raised, regarding going back to the original need to use the node to communicate information. E.g. the need for the heartbeat and the way HW/SW details are exposed to the node. Are you going to consider these as well in the summary/comparison?

If you are referring to the device-mgr/NFD solution then yes

RamLavi avatar Mar 25 '24 06:03 RamLavi

Updated the proposal to exhibit all the options discussed. This is done in order to allow open discussion.

RamLavi avatar Mar 27 '24 08:03 RamLavi

Updated the proposal to exhibit all the options discussed. This is done in order to allow open discussion.

I was expecting that by now there would be more discussion on this topic. Doing this right is important, and would make it easier for our future selves to maintain.

RamLavi avatar Apr 01 '24 06:04 RamLavi

Added a new option suggested by @xpivarc (thanks!)

RamLavi avatar Apr 15 '24 14:04 RamLavi

Change: Addressed @xpivarc 's comments.

@xpivarc can we change the PR title and maybe add a PR desc?

The PR title should be: virt-handler: Restrict write access to nodes The PR desc should be:

This PR is addressing a security issue [0][1] that exists because virt-handler is a daemonset that can patch the node. This proposal is presenting possible proposals to reduce virt-handler's node patching abilities to mitigate this issue.

[0] https://github.com/kubevirt/kubevirt/security/advisories/GHSA-cp96-jpmq-xrr2
[1] https://github.com/kubevirt/kubevirt/issues/9109

RamLavi avatar Apr 16 '24 12:04 RamLavi

Thanks @RamLavi for the exhausting analysis. I believe we exhausted all of our options. The order of the options matches my preference but here the option 1 needs little bit of clarification. The policies should be GA but the additional info that we need for the validation to work is feature gated itself. It is not clear to me what feature gate it is and what is the state. Nevertheless it would be an option per cluster configuration.

@rmohr @alaypatel07 @EdDev PTAL

While I have a non-blocking preference to the original proposal (new CRD) from an architectural perspective where I would want to go, option 1 seems to be the simple low-hanging fruit which we should take. +1

Here why I would still want to explore option2 in this and other contexts a little bit more later:

  • solves the heartbeat update as well
  • allows exchanging capabilities more structured (like node policies could be later on applied, matching shadow nodes, giving virt-handler information)
  • the webhook protection can still be applied to the "shadow node" (that was part 2 of the improvement, where the CRD instruction would be part 1)
  • if a CP node with virt-controller (which can continue to modify node objects either way) is taken over, people have so many privileges that it does not matter if virt-controller can change the node

On the plus side for the pure restriction option: letting virt-handler just adding the labels is simple, which I really like.

They are also not mutually exclusive, like we could still add a new CRD for specific use-cases while just restricting for now, if from a feature GA perspective things look promising.

So from that perspective I am ok with option 1, would just love to see some follow-up thoughts on the points above.

rmohr avatar Apr 17 '24 15:04 rmohr

Thanks @RamLavi for the exhausting analysis. I believe we exhausted all of our options. The order of the options matches my preference but here the option 1 needs little bit of clarification. The policies should be GA but the additional info that we need for the validation to work is feature gated itself. It is not clear to me what feature gate it is and what is the state. Nevertheless it would be an option per cluster configuration. @rmohr @alaypatel07 @EdDev PTAL

While I have a non-blocking preference to the original proposal (new CRD) from an architectural perspective where I would want to go, option 1 seems to be the simple low-hanging fruit which we should take. +1

+1 I updated the proposal to be more clear and also talk about other considerations like backport-comparibility. PTAL.

Here why I would still want to explore option2 in this and other contexts a little bit more later:

  • solves the heartbeat update as well
  • allows exchanging capabilities more structured (like node policies could be later on applied, matching shadow nodes, giving virt-handler information)
  • the webhook protection can still be applied to the "shadow node" (that was part 2 of the improvement, where the CRD instruction would be part 1)
  • if a CP node with virt-controller (which can continue to modify node objects either way) is taken over, people have so many privileges that it does not matter if virt-controller can change the node

On the plus side for the pure restriction option: letting virt-handler just adding the labels is simple, which I really like.

They are also not mutually exclusive, like we could still add a new CRD for specific use-cases while just restricting for now, if from a feature GA perspective things look promising.

So from that perspective I am ok with option 1, would just love to see some follow-up thoughts on the points above.

I agree that we should continue to explore the need to introduce such a CRD to better define communication between virt-handler to other components. However as of now I think that we don't have enough reason to add one. Let me elaborate:

  • heartbeat can be replaced by other logic that would get the same results, such as liveness and readiness probes, backed by a controller that would watch the virt-handler Daemonset status. I plan to check this and set a proposal in the future so that we could further debate it there.
  • we still don't have concrete plans for node policies. When we have such then I would be all forwards considering this all over again.
  • I'm not sure we need the webhook if we use this option. Can you elaborate?

RamLavi avatar Apr 18 '24 17:04 RamLavi

Change:

  • updated the chosen the solution: validationAdmissionPolicy, following the positive feedback and POC done offline.
  • updated the backwards compatibility approach, and added an appendix to further discuss it.

Note: This proposal is now mature enough and is waiting for approval.

RamLavi avatar May 08 '24 08:05 RamLavi

Thanks for the update Ram.

@RamLavi please resolve all open conversations in this document in order to move forward.

fabiand avatar May 08 '24 11:05 fabiand

Thanks for the update Ram.

@RamLavi please resolve all open conversations in this document in order to move forward.

Technically speaking, I don't think I can resolve conversations I didn't open, but I made sure I answered them.

RamLavi avatar May 08 '24 11:05 RamLavi

Thanks a lot @RamLavi for this proposal! As said above, I vote +1 to the Validating Admission Policy approach.

/lgtm

iholder101 avatar May 09 '24 14:05 iholder101

In general, the ShadowNode approach with the variation that @Barakmor1 suggested, would be my preference. My main fear is that Validating Admission Policy for virt-handler will make a debugging, when something goes work, challenging (to say the least) Moreover, it seems like a missed opportunity that we are not taking care of all the labels on the node virt-handler is setting - that is not isolating it on a separate object.

However, I'm not a blocking this - especially if @iholder101 and @jean-edouard are happy to support this. so /lgtm from me.

vladikr avatar May 09 '24 14:05 vladikr

Hey @vladikr! Before merging this, let's make sure we discuss all the points you've raised.

In general, the ShadowNode approach with the variation that @Barakmor1 suggested, would be my preference.

@Barakmor1 and I were discussing this approach offline. One of the problems with this approach is that pods are (mostly) immutable, and we can't change affinities nor .spec.nodeName after the pod is created.

The steps @Barakmor1 was mentioning above are:

  1. virt-handler is deployed and sets the ShadowNode Objects with the proper annotations and labels, just like you mentioned.
  2. Whenever a vmi is created, virt-api will mutate the virt-launcher pod with a dedicated SchedulingGate.
  3. virt-controller will detect virt-launcher pods with the SchedulingGate and do the following for each launcher pod:
    • Calculate suitable nodes for scheduling from the kubevirt perspective, let's mark it as <nodeList>.
    • Remove the SchedulingGate from the launcher pod and add node affinity to allow scheduling only to nodes in <nodeList>.

Unfortunately, the last bullet under (3) is not possible, since after a pod is created, we simply cannot set new affinities.

As you said rightfully, we can skip the step of adding a scheduling gate. But even then we're facing corner cases, such as:

  • A virt-launcher pod is created, targeting a single node N1 that fits.
  • Just before the pod is scheduled, N1 is being removed from the cluster.
  • Right after that, a new node N2 (that is fitting for the launcher pod) joins the cluster.
  • The pod remains scheduling forever.

In this (and similar) cases we would have to delete the pod and re-create it.

Bottom line is: to support such an approach we'd have to introduce a mechanism that monitors the scheduling pods and re-creates them when necessary. Let's continue discussing this approach further if you think it's viable/valuable.

My main fear is that Validating Admission Policy for virt-handler will make a debugging, when something goes work, challenging (to say the least)

That's a fair point. @xpivarc Any thoughts?

Moreover, it seems like a missed opportunity that we are not taking care of all the labels on the node virt-handler is setting - that is not isolating it on a separate object.

I 100% agree. This would definitely be ideal. The sad truth is that the vast vast majority of our ~200 labels are completely static in nature, and it's very tempting to save them elsewhere (either the shadow node CRD or even a configmap).

What concerns me around this is that we essentially have two options AFAICT:

  1. label the node, create virt-launcher (i.e. what we're doing now), let the scheduler handle scheduling.
  2. as said above, create a mechanism to monitor and re-create the launcher pod whenever necessary. Essentially, we get into the business of scheduling.

@vladikr what are your thoughts regarding that? Again if you think it's a reasonable path forward, let's discuss it!

iholder101 avatar May 09 '24 15:05 iholder101

Hey @vladikr! Before merging this, let's make sure we discuss all the points you've raised.

In general, the ShadowNode approach with the variation that @Barakmor1 suggested, would be my preference.

@Barakmor1 and I were discussing this approach offline. One of the problems with this approach is that pods are (mostly) immutable, and we can't change affinities nor .spec.nodeName after the pod is created.

The steps @Barakmor1 was mentioning above are:

  1. virt-handler is deployed and sets the ShadowNode Objects with the proper annotations and labels, just like you mentioned.

  2. Whenever a vmi is created, virt-api will mutate the virt-launcher pod with a dedicated SchedulingGate.

  3. virt-controller will detect virt-launcher pods with the SchedulingGate and do the following for each launcher pod:

    • Calculate suitable nodes for scheduling from the kubevirt perspective, let's mark it as .
    • Remove the SchedulingGate from the launcher pod and add node affinity to allow scheduling only to nodes in .

Unfortunately, the last bullet under (3) is not possible, since after a pod is created, we simply cannot set new affinities.

As you said rightfully, we can skip the step of adding a scheduling gate. But even then we're facing corner cases, such as:

  • A virt-launcher pod is created, targeting a single node N1 that fits.
  • Just before the pod is scheduled, N1 is being removed from the cluster.
  • Right after that, a new node N2 (that is fitting for the launcher pod) joins the cluster.
  • The pod remains scheduling forever.

In this (and similar) cases we would have to delete the pod and re-create it.

Bottom line is: to support such an approach we'd have to introduce a mechanism that monitors the scheduling pods and re-creates them when necessary. Let's continue discussing this approach further if you think it's viable/valuable.

My main fear is that Validating Admission Policy for virt-handler will make a debugging, when something goes work, challenging (to say the least)

That's a fair point. @xpivarc Any thoughts?

Moreover, it seems like a missed opportunity that we are not taking care of all the labels on the node virt-handler is setting - that is not isolating it on a separate object.

I 100% agree. This would definitely be ideal. The sad truth is that the vast vast majority of our ~200 labels are completely static in nature, and it's very tempting to save them elsewhere (either the shadow node CRD or even a configmap).

What concerns me around this is that we essentially have two options AFAICT:

  1. label the node, create virt-launcher (i.e. what we're doing now), let the scheduler handle scheduling.
  2. as said above, create a mechanism to monitor and re-create the launcher pod whenever necessary. Essentially, we get into the business of scheduling.

@vladikr what are your thoughts regarding that? Again if you think it's a reasonable path forward, let's discuss it!

I want to stress again that I am all forward to any change that will reduce patched to the node. IMO virt-handler should not patch the node at all. But the fact is it does patch it (even if we remove node-labeller), and in order to solve the CVE, the most direct way to solve it is to add a VAP. Can we agree to:

  1. Add the VAP as part of this proposal, solving the CVE.
  2. In parallel, propose ways to remove all the different scenarios where virt-handler patches the node (there is more than just node-labeller). For example, I plan to POC the removal of the heartbeat mechanism.. Other suggestions to remove other patches were already suggested on this PR (see solution 3), more suggestion are always welcome.
  3. Once all patches are gone, we can say goodbye to this VAP, and also the RBAC to patch the node.

RamLavi avatar May 09 '24 17:05 RamLavi

/approve

vladikr avatar May 09 '24 17:05 vladikr

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vladikr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

kubevirt-bot avatar May 09 '24 17:05 kubevirt-bot

/hold cancel @fabiand Seems we have agreement, I hope I did not miss anyone. Thanks everyone for the discussion, hopefully we can do more than fix CVEs in the future ;)

xpivarc avatar May 10 '24 15:05 xpivarc

Awesome

10x

:rocket:

@xpivarc we still need to follow up to limit a handlers powers to the workloads on it's own node. Riht? Can VAP help here as well?

fabiand avatar May 23 '24 13:05 fabiand

Awesome

10x

🚀

@xpivarc we still need to follow up to limit a handlers powers to the workloads on it's own node. Riht? Can VAP help here as well?

Yes the VAP could be used here but it is not strict requirement as we still have webhooks for the resources. Therefore it is matter of extending them to check the right handler is making the request.

xpivarc avatar May 27 '24 08:05 xpivarc