usernetes icon indicating copy to clipboard operation
usernetes copied to clipboard

ci: add test for rootful docker

Open vsoch opened this issue 10 months ago • 18 comments

I am finding with testing that the networking between hosts does not work when we are running in rootful. I was testing this because using nvidia devices does work with rootful, but once I got to the stop of needing pods to communicate, there was no communication.

I am not sure about the error, but this test should reproduce it in CI. Note that to enable this we use the docker-rootful template provided by lima (@AkihiroSuda you have thought of all things)! The main changes here are to add this test to the matrix, and ensure that in the different install scripts, we largely do nothing if the container runtime is docker-rootful.

Related to #365 but does not fix it, only demonstrates it.

vsoch avatar Feb 21 '25 17:02 vsoch

Note that I've seen two variants of this error - either an operation timeout (the result here):

image

Or that the address is not reachable / bad (what I've seen in production and my researchapps testing CI):

image

vsoch avatar Feb 21 '25 17:02 vsoch

Thanks, I confirmed that this issue happens on my local machines too, but I haven't identified the cause.

Tested with Docker v28 and v27.5.1, on Ubuntu 24.04.1 (ARM64).

I think it was working in the past?

AkihiroSuda avatar Feb 24 '25 12:02 AkihiroSuda

ICMP and DNS still seems to work, but TCP across the nodes seems broken?

VXLAN packets are apparently sent and received on each of the VMs, though. (Run tcpdump udp).

Apparently, the receiver VM seems refusing to route the VXLAN packets to the usernetes-node-1 container where kubelet, flannel, etc. are running in.

AkihiroSuda avatar Feb 24 '25 12:02 AkihiroSuda

Found a workaround: execute ethtool --offload eth0 tx-checksum-ip-generic off in usernetes-node-1 container

AkihiroSuda avatar Feb 24 '25 13:02 AkihiroSuda

Any eyes needed here from the Moby networking folks? (I know they're pretty busy currently, but if it's useful I can try ask them if they have time to spare to give it eyes)

thaJeztah avatar Feb 24 '25 13:02 thaJeztah

@AkihiroSuda do you remember the last time you tested with it working? In recent memory we had updates to flannel, the underlying kind node (Kubernetes version), and (for me) at some point last year the additional make sync-external-ip was added. If we can reproduce a previously working version it could be a good strategy to debug (to compare to).

vsoch avatar Feb 24 '25 16:02 vsoch

oh wow, this is really interesting!

Not sure if this is expected, but this looks to be a warning in the failed nerdctl setup:

Warning: 7m[WARNING] buildkitd has access to images in "buildkit" namespace by default. If you want to give buildkitd access to the images in "default" namespace, run this command with CONTAINERD_NAMESPACE=default

vsoch avatar Feb 24 '25 16:02 vsoch

The ethtool --offload eth0 tx-checksum-ip-generic off rule can be probably appended here: https://github.com/rootless-containers/usernetes/blob/b259da818f84fe33fe9ea32c71c9ea7317d467cc/Dockerfile.d/etc_udev_rules.d_90-flannel.rules#L1-L5

It is still unclear why this is needed only for rootful, though.

Any eyes needed here from the Moby networking folks? (I know they're pretty busy currently, but if it's useful I can try ask them if they have time to spare to give it eyes)

Thanks, that would be appreciated.

AkihiroSuda avatar Feb 25 '25 10:02 AkihiroSuda

Warning: 7m[WARNING] buildkitd has access to images in "buildkit" namespace by default. If you want to give buildkitd access to the images in "default" namespace, run this command with CONTAINERD_NAMESPACE=default

Irrelevant to the topic. Should be fixed though.

AkihiroSuda avatar Feb 25 '25 10:02 AkihiroSuda

@vsoch Do you plan to continue this?

AkihiroSuda avatar May 01 '25 02:05 AkihiroSuda

I would like to - from this comment: https://github.com/rootless-containers/usernetes/pull/366#issuecomment-2681540363 I thought we were waiting feedback from the Moby networking folks. Is the next step to try adding that line ethtool --offload eth0 tx-checksum-ip-generic off to the flannel rules?

vsoch avatar May 01 '25 06:05 vsoch

Is the next step to try adding that line ethtool --offload eth0 tx-checksum-ip-generic off to the flannel rules?

Yes (when running in rootful), and let's call it a day

AkihiroSuda avatar May 01 '25 06:05 AkihiroSuda

/cc @robmry @akerouanton

thaJeztah avatar May 01 '25 06:05 thaJeztah

Sounds good - I'll make some time in the next few days. It's after 1am here so I need to be off to sleep, but this is on my todo. Thanks for the ping @AkihiroSuda.

vsoch avatar May 01 '25 07:05 vsoch

Access from outside a host to container addresses inside bridge networks got blocked in moby 28.0, is that the issue? https://www.docker.com/blog/docker-engine-28-hardening-container-networking-by-default/

robmry avatar May 01 '25 07:05 robmry

If running dockerd with env var DOCKER_INSECURE_NO_IPTABLES_RAW=1 makes it work - that's the issue. Either way, I'd like to know more about what the network looks like - is it direct routing between container addresses, or do you have an overlay network in there?

robmry avatar May 01 '25 08:05 robmry

@AkihiroSuda I tried both approaches suggested above, still issues. I left both commits / changes for feedback. Let me know what I should try next.

vsoch avatar May 01 '25 13:05 vsoch

@AkihiroSuda do you have another suggestion for what to try here? We'd like to try rootless soon - we have some overhead running rootless and want to test if running with rootful removes it (and then we could deduce it's something about user space).

vsoch avatar May 13 '25 01:05 vsoch