systemd icon indicating copy to clipboard operation
systemd copied to clipboard

test: Add TEST-85-NETWORK to run systemd-networkd-tests.py

Open DaanDeMeyer opened this issue 9 months ago • 2 comments

This adds a testsuite unit to run systemd-networkd-tests.py. This is mkosi only for now as python is not available in the images set up by the bash framework. We give the test a lower priority as it takes a while to run so we want to start it as soon as possible.

DaanDeMeyer avatar May 06 '24 13:05 DaanDeMeyer

[!IMPORTANT] An -rc1 tag has been created and a release is being prepared, so please note that PRs introducing new features and APIs will be held back until the new version has been released.

github-actions[bot] avatar May 06 '24 13:05 github-actions[bot]

@yuwata Any chance you'd be able to help debug some of these failures? I don't think they're transient since it's the same distributions that are failing and succeeding on a second run. (Ignore centos 9 that one is the mirror being messed up)

DaanDeMeyer avatar May 06 '24 20:05 DaanDeMeyer

The failure on Fedora rawhide is caused by https://github.com/torvalds/linux/commit/3ddc2231c8108302a8229d3c5849ee792a63230d.

yuwata avatar May 10 '24 06:05 yuwata

@davem330

Could you fix https://github.com/torvalds/linux/commit/3ddc2231c8108302a8229d3c5849ee792a63230d ?

https://github.com/torvalds/linux/blob/3ddc2231c8108302a8229d3c5849ee792a63230d/net/ipv4/devinet.c#L1695 Here, ifm->ifa_flags is u8, but ifa->ifa_flags is u32. So, some flags are lost in the below. https://github.com/torvalds/linux/blob/3ddc2231c8108302a8229d3c5849ee792a63230d/net/ipv4/devinet.c#L1735

yuwata avatar May 10 '24 06:05 yuwata

@DaanDeMeyer About the failure in CentOS, it seems the test wrongly detects that sch_fq_codel module does not exist. But, from https://gitlab.com/redhat/centos-stream/rpms/kernel/-/blob/c9s/kernel-x86_64-rhel.config, sch_fq_codel is enabled and built-in. Maybe modprobe is broken??

yuwata avatar May 10 '24 08:05 yuwata

@yuwata First of all, thank you for all the test fixes and investigations!

I will take a look at centos to see if I can figure out what's going on with the module.

DaanDeMeyer avatar May 10 '24 08:05 DaanDeMeyer

@yuwata https://github.com/systemd/mkosi/pull/2699 will fix the issue.

DaanDeMeyer avatar May 10 '24 10:05 DaanDeMeyer

@DaanDeMeyer Nice. And the other remaining failure should be fixed by #32748.

yuwata avatar May 10 '24 11:05 yuwata

@yuwata I think we need a few more missing module skips

DaanDeMeyer avatar May 10 '24 16:05 DaanDeMeyer

@DaanDeMeyer For CentOS CIs and testing-farm:fedora-rawhide-x86_64, simple Makefile for TEST-85 is necessary.

yuwata avatar May 10 '24 17:05 yuwata

@yuwata they're actually supposed to be skipped on those since I'm not sure how hard it is to get python into the images from the bash framework. I'll see about doing this in a way so that it's not picked up by the old bash framework.

DaanDeMeyer avatar May 10 '24 17:05 DaanDeMeyer

About activation policy failures in mkosi, unfortunately I have no idea and I cannot reproduce the issue on my laptop. And, the journal files of the failed tests seem to be already removed. Is it possible to extend the lifetime of failed logs?

yuwata avatar May 10 '24 17:05 yuwata

@yuwata At the end of each test there is a command to download the logs, e.g. for the latest Arch Linux failure, you can get the journal with:

gh run download 9032381766 --name ci-mkosi-9032381766-1-arch-rolling-failed-test-journals -D ci/ci-mkosi-9032381766-1-arch-rolling-failed-test-journals && journalctl --file ci/ci-mkosi-9032381766-1-arch-rolling-failed-test-journals/test/journal/TEST-85-NETWORK.journal --no-hostname -o short-monotonic -u testsuite-85.service -p info

You have to install the gh tool (packaged in Fedora, Arch, Debian, OpenSUSE) and run it once in the systemd repository directory to set it up and then running the above will download the journal to the ci/ directory and open it with journalctl.

DaanDeMeyer avatar May 11 '24 13:05 DaanDeMeyer

@DaanDeMeyer Thanks.

[ 1289.041546] systemd-networkd-tests.py[453]:   test_activation_policy_required_for_online (__main__.NetworkdNetworkTests.test_activation_policy_required_for_online) (policy='always-down', required='yes') ... FAIL
[ 1289.057316] systemd-journald[300]: [🡕] Suppressed 3126 messages from systemd-networkd.service

Huh...

yuwata avatar May 13 '24 19:05 yuwata

The test generates so many debugging logs. Could you add something like the following?

# /etc/systemd/journald.conf.d/disable-ratelimit.conf
[Journal]
RateLimitIntervalSec=0
RateLimitBurst=0

yuwata avatar May 13 '24 19:05 yuwata

Once https://github.com/systemd/systemd/pull/32766 is in I am going to rework this to define one meson test per test class in systemd-networkd-tests.py as running them all in a single virtual machine takes way too long.

DaanDeMeyer avatar May 13 '24 19:05 DaanDeMeyer

Ah, I understand the failure. For some reasons, the test generates many debugging logs. The test checks journal but the expected line is suppressed because of ratelimiting. As I disabled journal ratelimiting on my laptop, so I could not reproduce the issue.

Unfortunately, the commit mkosi: Disable journald rate-limiting does not work, and still some lines are dropped. From https://github.com/systemd/systemd/actions/runs/9068670883/job/24916498168?pr=32791

[   55.913557] systemd-networkd-tests.py[430]:   test_activation_policy_required_for_online (__main__.NetworkdNetworkTests.test_activation_policy_required_for_online) (policy='always-down', required='yes') ... FAIL
[   55.928592] systemd-journald[299]: [🡕] Suppressed 2112 messages from systemd-networkd.service

Anyway, really disabling journal ratelimiting should make the test passed, hopefully.

yuwata avatar May 13 '24 20:05 yuwata

Again, I think we should add a dummy Makefile something like that in TEST-85:

# SPDX-License-Identifier: LGPL-2.1-or-later

all setup run clean clean-again:
	true

.PHONY: all setup run clean clean-again

yuwata avatar May 14 '24 20:05 yuwata