community
community copied to clipboard
vHostUser interface Design Proposal
This is a design proposal for the vhostuser interface to support UserSpace networking feature by the kubevirt VM.
The document provides information on the Feature/Interface proposed, why it is needed, how it can enable fast packet processing when dpdk is available.
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign cwilkers for approval by writing /assign @cwilkers
in a comment. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve
in a comment
Approvers can cancel approval by writing /approve cancel
in a comment
Hi @nvinnakota10. Thanks for your PR.
I'm waiting for a kubevirt member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test
on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test
label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
@nvinnakota10 I'm still missing the scheduling part or is it part of CNI? My question is will be a VMI requesting this network interface automatically scheduled on nodes where the CNI plugin is installed or does the virt-launcher pod requires additional information?
/cc
@nvinnakota10 I'm still missing the scheduling part or is it part of CNI? My question is will be a VMI requesting this network interface automatically scheduled on nodes where the CNI plugin is installed or does the virt-launcher pod requires additional information?
The scheduling part is done by the k8s. In the VM spec, the user can specify custom labels and use them as selectors for the scheduling part. Moreover, user can specify the huge pages required and memory requests. This will allow user to label the nodes that can support and k8s to schedule accordingly based on requirements.
Please use the term vhost-user-net to refer to the network device since there are other vhost-user devices too (blk, fs, gpio, etc). KubeVirt may support other vhost-user devices in the future, so it helps to be clear what is specific vhost-user-net and what is generic vhost-user infrastructure that other devices will also use.
Edit: To clarify what I mean, please change the title of this proposal to vhost-user-net so it's clear this vhost-user integration is specific to network interfaces.
By the way, I'm not sure this is relevant, but we're working to add vhost-user support to passt, and KubeVirt could very reasonably use it for improved throughput once it's ready.
The current QEMU and passt command lines are something on the lines of:
./passt --vhost-user -s /tmp/passt_1.socket
qemu-system-x86_64 ... -m 4G \
-object memory-backend-memfd,id=memfd0,share=on,size=4G \
-numa node,memdev=memfd0 \
-chardev socket,id=chr0,path=/tmp/passt_1.socket \
-netdev vhost-user,id=netdev0,chardev=chr0 \
-device virtio-net,netdev=netdev0 \
...
of course, the aim is different from DPDK: passt aims at unprivileged isolation rather than direct access, and it lives in the pod itself (no need to pierce through namespaces to get that to the UNIX socket), so I guess it should have no influence on this proposal, but I thought I'd mention it anyway.
By the way, I'm not sure this is relevant, but we're working to add vhost-user support to passt, and KubeVirt could very reasonably use it for improved throughput once it's ready.
The current QEMU and passt command lines are something on the lines of:
./passt --vhost-user -s /tmp/passt_1.socket qemu-system-x86_64 ... -m 4G \ -object memory-backend-memfd,id=memfd0,share=on,size=4G \ -numa node,memdev=memfd0 \ -chardev socket,id=chr0,path=/tmp/passt_1.socket \ -netdev vhost-user,id=netdev0,chardev=chr0 \ -device virtio-net,netdev=netdev0 \ ...
of course, the aim is different from DPDK: passt aims at unprivileged isolation rather than direct access, and it lives in the pod itself (no need to pierce through namespaces to get that to the UNIX socket), so I guess it should have no influence on this proposal, but I thought I'd mention it anyway.
@sbrivio-rh the setup for passt is for sure different, but I think the KubeVirt API extension for vhost-user-net interface can help for passt too. I'm referring to the discussion in https://github.com/kubevirt/community/pull/218#discussion_r1183788198
Hi @nvinnakota10 , Would it be possible to refer to cni providing this functionality and include how it works for Pods?
:+1: Yes, please. It's worth defining the interface for regular k8s Pods (without KubeVirt in the picture). For example, someone may want to run a DPDK application in a regular Pod attached to a DPDK software switch provided by CNI using vhost-user.
@nvinnakota10 I don't see a URL for a video conference in the kick-off event calendar invite you sent. Can you share the URL?
Hey, @nvinnakota10 and all. Raised this also in the mailing list, but just decided to duplicate here. I am curious about the status of this PR. Any chance we can move this forward? What are the current blockers? Ping @maiqueb, @EdDev, @xpivarc, @alicefr.
@vasiliy-ul I might be wrong, but I think this is blocked by the fact that we would like to separate the network plugin externally and we are missing the infrastructure for this
UPDATE: this might be already available from this PR: https://github.com/kubevirt/kubevirt/pull/10284
@vasiliy-ul I might be wrong, but I think this is blocked by the fact that we would like to separate the network plugin externally and we are missing the infrastructure for this
UPDATE: this might be already available from this PR: kubevirt/kubevirt#10284
@alicefr, thanks for the link :+1: I was not aware of that PR. Will take a closer look.
I've seen this concern raised here in this PR, but from the conversations it is not clear if it's a blocker or not: https://github.com/kubevirt/community/pull/218#discussion_r1210607218 and https://github.com/kubevirt/community/pull/218#pullrequestreview-1419855883
So, does that mean that from now on this new API is the way to go if we need to introduce network bindings?
I've seen this concern raised here in this PR, but from the conversations it is not clear if it's a blocker or not: #218 (comment) and #218 (review)
So, does that mean that from now on this new API is the way to go if we need to introduce network bindings?
This was at least my understanding, @EdDev could you please confirm this?
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
/lifecycle rotten
The networking bindings were already refactored, and now it is possible for community provided bindings to be provided out of (KubeVirt) tree.
Should this proposal be revisited under these new terms ?
Should this proposal be revisited under these new terms ?
Which prompts an update from my side: we're about to merge vhost-user support in passt.
I guess it's going to be transparent for KubeVirt, that is, libvirt will eventually pass different options to both passt (--vhost-user
and perhaps different socket path) and QEMU (essentially the memory-backend-memfd
object).
So, that support doesn't aim in any way at replacing this proposal, and probably it doesn't integrate with it at all, but please let me know if opinions differ.
Should this proposal be revisited under these new terms ?
Which prompts an update from my side: we're about to merge vhost-user support in passt.
I guess it's going to be transparent for KubeVirt, that is, libvirt will eventually pass different options to both passt (
--vhost-user
and perhaps different socket path) and QEMU (essentially thememory-backend-memfd
object).So, that support doesn't aim in any way at replacing this proposal, and probably it doesn't integrate with it at all, but please let me know if opinions differ.
Well, while it doesn't intend to replace this proposal, it surely brings forward a "cheaper" alternative.
It would be interesting to see an "alternatives" section in this proposal focusing on that @sbrivio-rh .
Thanks for raising it.
@alicefr @EdDev @phoracek in the light of the other plugin based vhost user proposal - What should we do about this one? Can it be closed?
@fabiand I would, at least, wait until we see the other proposal. Otherwise, @nvinnakota10 are you still working on it?
Fair point. When are we expecting the new proposal?
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
/lifecycle rotten
Hi All,
We are still working on a proper implementation of the vhostuser network binding plugin and all stuff around:
- device plugin
- device-info spec implementation
- downward API support for plugin in Kubevirt 1.3
- and a workaround for Kubevirt 1.2 -> mutating webhook
Benoit.
/remove-lifecycle rotten /remove-lifecycle stale