Add virtual network switch component
VSwitch
A virtual network switch (vswitch) forwards Ethernet frames between components similar to a physical network switch.
Current Design
A vswitch contains a number of ports, each consisting of a set of sDDF network queues (TX active & free, RX active & free). These ports correspond to Ethernet ports on a physical switch.
The vswitch forwards frames similar to a normal switch. It doesn't know mac addresses at compile time, it generates a forwarding table at runtime based on the broadcasts and responses it receives.
There is a many-to-one relationship between mac addresses and ports. Several mac addresses may be associated with a single port if that port connects to another vswitch or virtualiser.
Filtering
Filtering is done at the port level. Each port $P$ has an associated allow_list bitmap specifying which destination ports $P$ may send to. This is enough to, for example, only allow some clients in a system to access the outside world (i.e., the network card). A future design could add filtering on the mac address level however this is more complex to implement. Another option is to allow the user to supply a bool vswitch_can_send(src, dest) function to make this policy interchangeable.
Copying
The vswitch currently copies packets when forwarding as otherwise free buffers could all accumulate on a single (destination) client. It may be possible to move the copying into copy components situated between the vswitch and a client's RX port, however, care needs to be taken to return free buffers back to their original sender.
Example System
Testing is currently being done on libvmm alexbr/vswitch
Tasks
- [x] Create test example
- [x] Get working on Qemu
- [x] Get working on OdroidC4
- [ ] Determine best vswitch priority
- [ ] Use
memcpyinstead ofsddf_memcpy - [ ] Sort out
vswitch_config.handethernet_config.hformat (see Open Question below) - [ ] Later: make MAC address lookup table swappable to another data structure (the optimal data structure will depend per use case). This is easily done by moving definitions to a header and/or their own C file.
- [ ] Update to Microkit 1.4.0. In this version, the time the VM sees goes from real elapsed time to VCPU elapsed time. This slows down VM networking which relies heavily on elapsed time.
Open Question: who should multiplex?
@wom-bat @Courtney3141 @Ivan-Velickovic the following is a design question I think you guys should be aware of.
Currently both the vswitch and the virtualiser multiplex packets - they have overlapping responsibilities. The net virtualiser implements a one to one mapping of MAC addresses to clients. In a system with a vswitch or a firewall, there may be a many to one mapping between MACs to clients. To get around this issue, I had to turn off the multiplexing part of the virtualiser as follows:
// network/components/virt_rx.c
int client = 0; // get_mac_addr_match((struct ethernet_header *) buffer_vaddr);
Since this is a general issue regarding component behaviour, I thought it would be good to discuss solutions to this issue (from easy workarounds to more principled approaches). If performance weren't an issue, I'd say it would be natural to make the only job of the virtualiser to convert offsets and perform cache operations, and leave multiplexing to a separate component (vswitch or otherwise). In the case that this results in significant performance loss, I think there are a few options:
- Change existing virtualisers to support a many to one mapping of macs to client IDs (not principled IMO since vswitch already does this. You would pretty much be implementing a static vswitch)
- Create a new type of virtualiser which doesn't multiplex and put this in front of the vswitch (but keep current virtualiser component as well)
- Change old virtualiser design to not multiplex and require another multiplexing component to be used (e.g., vswitch)
- Make vswitch a type of virtualiser which also performs cache ops and offset translation (not a nice design since the vswitch is designed such that it is agnostic to what type of component is connected to each switch port)
Another approach would be to improve how we implement composability in the system. Currently systems are composable by swapping out components. This has the downside that at every point of composability you need extra IPCs between PDs. On the other end of the spectrum, traditional systems use software libraries (with common APIs) to do this. I propose we implement these components as single-file libraries to allow the system designer to decide how much isolation or performance they desire. A critical system would include the vswitch / multiplexing library in a separate component to the cache ops / address translation while a more performant system could merge these into one PD (similar to Hong-Meng). This would also reduce code duplication making code less error-prone. Eventually, the a user could specify during the build process which components they want to combine for performance reasons.