coreos-assembler
coreos-assembler copied to clipboard
kola: provide node to node connectivity to QEMU platform for multi-node tests
Hi,
In flatcar-linux/mantle fork - we implemented back in the days a network setup to provide Internet connectivity to QEMU instances: see this PR and this commit in particular: https://github.com/flatcar-linux/mantle/commit/d065c5df4ab980966c482aa40547f55b2594d6b3.
Do you think this is something you could be interested to ingest in your codebase ? According to the documentation:
Local platforms do not rely on access to the Internet as a design principle of kola, minimizing external dependencies. Any network services required get built directly into kola itself.
I think the proposed implementation is fulfilling this requirement:
[...] Any network services required get built directly into kola itself.
We only rely on virtual ethernet pair and NAT.
For us our main pipeline runs unprivileged in Kubernetes/OpenShift. I think some of the veth and iptables stuff can be done in an unprivileged network namespace, but AIUI it's hard to do all useful networking fully unprivileged.
In practice I think what we really want is to do libvirt-based testing, not direct qemu for this use case. In this case libvirt is kind of multitenant but in practice it's so easy to "leak state" and have tests conflict that the model should be:
- provision host with libvirt (could even be a FCOS system with libvirt package layered)
- schedule container with mantle (coreos-assembler) on that host that has access to libvirt socket
- Tear down that host
An alternative optimization here is to retain a provisioned libvirt host between tests, but flush all libvirt state.
Thanks @tormath1 for reaching out to collaborate.
In mantle today we don't have any internet restrictions in our qemu tests. I know this is working because we have a lot of Fedora CoreOS tests that reach out to the network to perform various actions.
There are two things that happened some time ago (before the mantle code base was merged into coreos-assembler) that I think gave us this:
- We added
qemu-unprivand started using that for allqemutests- https://github.com/coreos/mantle/commit/67b9c0d
- We dropped the original restriction around networking
- https://github.com/coreos/mantle/commit/01cd295
Thanks for the details, qemu-unpriv machines cannot communicate is still valid though, or? That's why we extended the QEMU platform (I've read that there are tricks to let the unpriv slirp setup communicate but we haven't looked into it).
hey @pothos we use qemu-unpriv for pretty much all of our testing and we access the network in a large portion of our tests. i.e. if you want to run a test that pulls a container from a container registry and runs it, you can do that with qemu-unpriv.
Where is the qemu-unpriv machines cannot communicate text that you refer to coming from? Our documentation?
The kola test annotations for excluding qemu-unpriv: https://github.com/coreos/coreos-assembler/search?q=qemu-unpriv+machines+cannot+communicate&type=
That's the main reason we stick to using the other qemu platform because it allows us to run things like the kubeadm test @tormath1 added (I realize that we didn't exclude qemu-unpriv there yet but have to since it won't work).
Most of our kola tests are single-node. qemu-unpriv nodes can communicate with the Internet, but nodes in multiple-node test clusters cannot communicate with each other.
Ahh, ok now I understand what you were asking.
Updated the title to be more accurate.
IIUC we ripped out the qemu platform so I doubt we're going to reinstate it. In that case this turns into one of the following two options:
- close this request because we can't/won't reinstate the old code
- add support for intra-node connectivity to
qemu-unprivsomehow- not sure if this is possible (might have to get creative)
A while ago I searched and found this: https://lists.gnu.org/archive/html/qemu-discuss/2014-11/msg00020.html
and the socket backend (-netdev socket,id=mynet0,listen=:1234 and -netdev socket,id=mynet0,connect=:1234)
Edit: this looks doable: https://gist.github.com/mcastelino/88195a7d99811a177f5e643d1465e19e
Edit2: implemented it here: https://github.com/flatcar-linux/mantle/pull/307
Sure, we can close this, it was more a hint in case you may have interest.
So...quite a while ago we merged coreos-assembler and mantle. There were a lot of benefits but also drawbacks to this.
I think what we can try to do is factor out at least our qemu code into a separate Go module. Then it seems relatively straightforward to share maintenance of that with flatcar. WDYT?
With this: https://github.com/flatcar/Flatcar/issues/1386 I'm wondering again if we should not seat together and see what we can do to merge back fcos/mantle with flatcar/mantle both world could benefit from such a merge: for example we added Brightbox and Scaleway platform to our Flatcar fork. Users will benefit from this merge too as we will cover more test scenarios.
Agree it would be nice if we could converge/merge back the mantle tools. As you mentioned we would get Scaleway support.
A while back, we fully merged the mantle code into our coreos-assembler repo but I think it was mostly to simplify building and testing things in a single PR on our side. It should still be usable "standalone".
How do you use mantle in your CI?
In Flatcar's CI, Mantle (kola, plume and ore) is consumed via its Docker image. For each commit in Mantle, a Docker image is built and this image is consumed in the CI. (https://github.com/flatcar/scripts/blob/main/sdk_container/.repo/manifests/mantle-container) A first step, would be to get an overview of the diff between the two projects. We could then decide if we go with a common library and keep the specific FCOS / Flatcar bits downstream or merge back everything in a single project.