coreos-assembler kola: provide node to node connectivity to QEMU platform for multi-node tests

Hi,

In flatcar-linux/mantle fork - we implemented back in the days a network setup to provide Internet connectivity to QEMU instances: see this PR and this commit in particular: https://github.com/flatcar-linux/mantle/commit/d065c5df4ab980966c482aa40547f55b2594d6b3.

Do you think this is something you could be interested to ingest in your codebase ? According to the documentation:

Local platforms do not rely on access to the Internet as a design principle of kola, minimizing external dependencies. Any network services required get built directly into kola itself.

I think the proposed implementation is fulfilling this requirement:

[...] Any network services required get built directly into kola itself.

We only rely on virtual ethernet pair and NAT.

Mar 07 '22 14:03 tormath1

For us our main pipeline runs unprivileged in Kubernetes/OpenShift. I think some of the veth and iptables stuff can be done in an unprivileged network namespace, but AIUI it's hard to do all useful networking fully unprivileged.

In practice I think what we really want is to do libvirt-based testing, not direct qemu for this use case. In this case libvirt is kind of multitenant but in practice it's so easy to "leak state" and have tests conflict that the model should be:

provision host with libvirt (could even be a FCOS system with libvirt package layered)
schedule container with mantle (coreos-assembler) on that host that has access to libvirt socket
Tear down that host

An alternative optimization here is to retain a provisioned libvirt host between tests, but flush all libvirt state.

Mar 07 '22 16:03 cgwalters

Thanks @tormath1 for reaching out to collaborate.

In mantle today we don't have any internet restrictions in our qemu tests. I know this is working because we have a lot of Fedora CoreOS tests that reach out to the network to perform various actions.

There are two things that happened some time ago (before the mantle code base was merged into coreos-assembler) that I think gave us this:

We added qemu-unpriv and started using that for all qemu tests
- https://github.com/coreos/mantle/commit/67b9c0d
We dropped the original restriction around networking
- https://github.com/coreos/mantle/commit/01cd295

Mar 07 '22 20:03 dustymabe

Thanks for the details, qemu-unpriv machines cannot communicate is still valid though, or? That's why we extended the QEMU platform (I've read that there are tricks to let the unpriv slirp setup communicate but we haven't looked into it).

Mar 09 '22 15:03 pothos

hey @pothos we use qemu-unpriv for pretty much all of our testing and we access the network in a large portion of our tests. i.e. if you want to run a test that pulls a container from a container registry and runs it, you can do that with qemu-unpriv.

Where is the qemu-unpriv machines cannot communicate text that you refer to coming from? Our documentation?

Mar 09 '22 18:03 dustymabe

The kola test annotations for excluding qemu-unpriv: https://github.com/coreos/coreos-assembler/search?q=qemu-unpriv+machines+cannot+communicate&type=

That's the main reason we stick to using the other qemu platform because it allows us to run things like the kubeadm test @tormath1 added (I realize that we didn't exclude qemu-unpriv there yet but have to since it won't work).

Mar 09 '22 21:03 pothos

Most of our kola tests are single-node. qemu-unpriv nodes can communicate with the Internet, but nodes in multiple-node test clusters cannot communicate with each other.

Mar 15 '22 16:03 bgilbert

Ahh, ok now I understand what you were asking.

Mar 15 '22 16:03 dustymabe

Updated the title to be more accurate.

Mar 15 '22 16:03 dustymabe

IIUC we ripped out the qemu platform so I doubt we're going to reinstate it. In that case this turns into one of the following two options:

close this request because we can't/won't reinstate the old code
add support for intra-node connectivity to qemu-unpriv somehow
- not sure if this is possible (might have to get creative)

Mar 15 '22 16:03 dustymabe

A while ago I searched and found this: https://lists.gnu.org/archive/html/qemu-discuss/2014-11/msg00020.html and the socket backend (-netdev socket,id=mynet0,listen=:1234 and -netdev socket,id=mynet0,connect=:1234) Edit: this looks doable: https://gist.github.com/mcastelino/88195a7d99811a177f5e643d1465e19e Edit2: implemented it here: https://github.com/flatcar-linux/mantle/pull/307

Sure, we can close this, it was more a hint in case you may have interest.

Mar 15 '22 18:03 pothos

So...quite a while ago we merged coreos-assembler and mantle. There were a lot of benefits but also drawbacks to this.

I think what we can try to do is factor out at least our qemu code into a separate Go module. Then it seems relatively straightforward to share maintenance of that with flatcar. WDYT?

Mar 22 '22 14:03 cgwalters

With this: https://github.com/flatcar/Flatcar/issues/1386 I'm wondering again if we should not seat together and see what we can do to merge back fcos/mantle with flatcar/mantle both world could benefit from such a merge: for example we added Brightbox and Scaleway platform to our Flatcar fork. Users will benefit from this merge too as we will cover more test scenarios.

Mar 05 '24 16:03 tormath1

Agree it would be nice if we could converge/merge back the mantle tools. As you mentioned we would get Scaleway support.

A while back, we fully merged the mantle code into our coreos-assembler repo but I think it was mostly to simplify building and testing things in a single PR on our side. It should still be usable "standalone".

How do you use mantle in your CI?

Mar 07 '24 15:03 travier

In Flatcar's CI, Mantle (kola, plume and ore) is consumed via its Docker image. For each commit in Mantle, a Docker image is built and this image is consumed in the CI. (https://github.com/flatcar/scripts/blob/main/sdk_container/.repo/manifests/mantle-container) A first step, would be to get an overview of the diff between the two projects. We could then decide if we go with a common library and keep the specific FCOS / Flatcar bits downstream or merge back everything in a single project.

Mar 08 '24 08:03 tormath1

coreos-assembler coreos-assembler copied to clipboard

kola: provide node to node connectivity to QEMU platform for multi-node tests

coreos-assembler
coreos-assembler copied to clipboard