zfs icon indicating copy to clipboard operation
zfs copied to clipboard

ZTS: Use QEMU for tests on Linux and FreeBSD

Open mcmilk opened this issue 2 years ago • 7 comments

Motivation and Context

We have the need for more tests on systems != Ubuntu.

Description

This commit adds functional tests for these systems:

  • AlmaLinux 8, AlmaLinux 9
  • ArchLinux
  • CentOS Stream 8, CentOS Stream 9
  • Fedora 38, Fedora 39
  • Debian 11, Debian 12
  • FreeBSD 13, FreeBSD 14, FreeBSD 15
  • Ubuntu 22.04, Ubuntu 24.04

Workflow for each operating system:

  • install QEMU on the github runner
  • download current cloud image
  • start and init that image via cloud-init
  • install deps and poweroff system
  • start system and build openzfs and then poweroff again
  • clone the system and start 3 qemu machines for tests
  • use trimable virtual disks (3x 2GB)
  • do the functional testings in < 3h

How Has This Been Tested?

This has been tested on my own repo, but more testing is needed....

Types of changes

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [x] New feature (non-breaking change which adds functionality)
  • [ ] Performance enhancement (non-breaking change which improves efficiency)
  • [x] Code cleanup (non-breaking change which makes code smaller or more readable)
  • [ ] Breaking change (fix or feature that would cause existing functionality to change)
  • [ ] Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • [ ] Documentation (a change to man pages or other documentation)

Checklist:

  • [x] My code follows the OpenZFS code style requirements.
  • [x] I have updated the documentation accordingly.
  • [x] I have read the contributing document.
  • [x] I have added tests to cover my changes.
  • [x] I have run the ZFS Test Suite with this change applied.
  • [x] All commit messages are properly formatted and contain Signed-off-by.

mcmilk avatar Jan 30 '24 00:01 mcmilk

Most FreeBSD tests will get fixed via starting nfsd+samba I think.

mcmilk avatar Jan 30 '24 19:01 mcmilk

@mcmilk I see you currently have this marked as "Draft". When you think it's ready to be reviewed, please let us know and we can take a look.

tonyhutter avatar Feb 02 '24 17:02 tonyhutter

Seems ready, I included the FreeBSD src.txz within the FreeBSD cloud image. But these testings will take some time..... ;-)

mcmilk avatar Mar 03 '24 21:03 mcmilk

Note: I'm actively testing this PR in #16195. Right now I'm running down a bunch of test failures.

tonyhutter avatar Jun 05 '24 00:06 tonyhutter

Note: I'm actively testing this PR in #16195. Right now I'm running down a bunch of test failures.

I am back from holiday and will also help. I'll investigate the serial console thing first.

mcmilk avatar Jun 05 '24 03:06 mcmilk

It's not final.

The summary isn't ready and some debug things need to be removed.

Can I leave the Ubuntu tests out? Reason: we have 20 actions runners, this PR needs 15:

  • 1x for checkstyle
  • 1x for CodeQL
  • 13x for the different systems

I would like to add some SUSE distribution as well.

mcmilk avatar Jun 16 '24 08:06 mcmilk

Just to make things easier (and not use so many runners), you can exclude the debian* centos-stream* and archlinux runners, since we currently don't support them in buildbot. And when I say exclude, I mean just don't include them in zfs-linux.yml, but keep the rest of the support code you've written (like debian() and archlinux()).

tonyhutter avatar Jun 17 '24 22:06 tonyhutter

I think it's done now. We can remove the "Status: Work in Progress" badge....

@tonyhutter - What do you think?

mcmilk avatar Jul 17 '24 21:07 mcmilk

@mcmilk that's great news! I'll take a look once all the runners report back.

tonyhutter avatar Jul 17 '24 22:07 tonyhutter

@mcmilk that's great news! I'll take a look once all the runners report back.

I force pushed again and removed centos-stream-9 and some debugging things within the scripts.

I have seen that you would like to split the tests into fractions like this: 1/3 2/3 ... do you want to add this later or is this just an idea?

mcmilk avatar Jul 18 '24 07:07 mcmilk

I have added FreeBSD 13.3 RELEASE and FreeBSD 14.1 RELEASE to the testings. It would be nice, if we can also add Debian 11 + 12 by default to the tesstings.

mcmilk avatar Jul 19 '24 20:07 mcmilk

I have seen that you would like to split the tests into fractions like this: 1/3 2/3 ... do you want to add this later or is this just an idea?

Correct, right now it's just an idea. I think it might help with some timing-related failures like:

almalinux8: auto_replace_002_pos
Fedora 40: zpool_status_008_pos

I also vaguely remember buildbot giving me issues if I ran with instances that were less than 8GB RAM as well. That's why I'm curious if running 2 VMs with 8GB RAM might make many of this failures go away. I'm starting to get my variable-number-of-VMs code working with 2 VMs in my testing PR (https://github.com/tonyhutter/zfs/pull/1), but I haven't gotten a full run working yet. Once I can get a full run with 2 VMs tested, I wanted to compare it's failures to the remaining failures in this PR. That will help us understand if the failures are timing/underpowered-VM related, or if we need to do some manual fixes to the tests.

tonyhutter avatar Jul 22 '24 21:07 tonyhutter

Oh no, I forgot the changed zfs-tests.sh script for this pull request :(

mcmilk avatar Jul 24 '24 17:07 mcmilk

Almalinux 8+9, Debian and the FreeBSD 13+14 systems should go green now.

mcmilk avatar Jul 24 '24 17:07 mcmilk

It would be easier - and faster - if the github runners would have 16Gig more RAM. I think the PR is ready now.

mcmilk avatar Aug 14 '24 12:08 mcmilk

@mcmilk I think we might be missing some stderr output on the QEMU builders. For example, here's the same ZTS bug (https://github.com/openzfs/zfs/issues/16439) on both builders:

QEMU:

  config.status: executing depfiles commands
  config.status: executing libtool commands
  config.status: executing po-directories commands
  make[2]: Entering directory '/tmp/zfs-build-zfs-yeSNFC5X/BUILD/zfs-kmod-2.2.99/_kmod_build_6.10.3-100.fc39.x86_64'
    GEN      gitrev
  make  all-recursive
  make[3]: Entering directory '/tmp/zfs-build-zfs-yeSNFC5X/BUILD/zfs-kmod-2.2.99/_kmod_build_6.10.3-100.fc39.x86_64'
  Making all in include
  make[4]: Entering directory '/tmp/zfs-build-zfs-yeSNFC5X/BUILD/zfs-kmod-2.2.99/_kmod_build_6.10.3-100.fc39.x86_64/include'
  make[4]: Nothing to be done for 'all'.
  make[4]: Leaving directory '/tmp/zfs-build-zfs-yeSNFC5X/BUILD/zfs-kmod-2.2.99/_kmod_build_6.10.3-100.fc39.x86_64/include'
  Making all in module
  make[4]: Entering directory '/tmp/zfs-build-zfs-yeSNFC5X/BUILD/zfs-kmod-2.2.99/_kmod_build_6.10.3-100.fc39.x86_64/module'
  mkdir -p os/linux/spl/
  mkdir -p avl/ icp/ icp/algs/aes/ icp/algs/blake3/ icp/algs/edonr/ icp/algs/modes/ icp/algs/sha2/ icp/algs/skein/ icp/api/ icp/asm-aarch64/blake3/ icp/asm-aarch64/sha2/ icp/asm-arm/sha2/ icp/asm-ppc64/blake3/ icp/asm-ppc64/sha2/ icp/asm-x86_64/aes/ icp/asm-x86_64/blake3/ icp/asm-x86_64/modes/ icp/asm-x86_64/sha2/ icp/core/ icp/io/ icp/spi/ lua/ lua/setjmp/ nvpair/ os/linux/zfs/ unicode/ zcommon/ zfs/ zstd/ zstd/lib/common/ zstd/lib/compress/ zstd/lib/decompress/
  make -C /usr/src/kernels/6.10.3-100.fc39.x86_64  \
  	  \
  	M="$PWD"  CONFIG_DEBUG_INFO=y CONFIG_ZFS=m modules
  make[5]: Entering directory '/usr/src/kernels/6.10.3-100.fc39.x86_64'
  make[5]: Leaving directory '/usr/src/kernels/6.10.3-100.fc39.x86_64'
  make[4]: Leaving directory '/tmp/zfs-build-zfs-yeSNFC5X/BUILD/zfs-kmod-2.2.99/_kmod_build_6.10.3-100.fc39.x86_64/module'
  make[3]: Leaving directory '/tmp/zfs-build-zfs-yeSNFC5X/BUILD/zfs-kmod-2.2.99/_kmod_build_6.10.3-100.fc39.x86_64'
  make[2]: Leaving directory '/tmp/zfs-build-zfs-yeSNFC5X/BUILD/zfs-kmod-2.2.99/_kmod_build_6.10.3-100.fc39.x86_64'
  
  RPM build warnings:
  
  RPM build errors:
  make[1]: Leaving directory '/home/zfs/zfs'

https://github.com/openzfs/zfs/actions/runs/10388084683/job/28762944809

BUILDBOT:

config.status: executing depfiles commands
config.status: executing libtool commands
config.status: executing po-directories commands
+ make -j2
make[2]: Entering directory '/tmp/zfs-build-buildbot-2Wo4V8Y2/BUILD/zfs-kmod-2.2.99/_kmod_build_6.10.3-100.fc39.x86_64'
  GEN      gitrev
make  all-recursive
make[3]: Entering directory '/tmp/zfs-build-buildbot-2Wo4V8Y2/BUILD/zfs-kmod-2.2.99/_kmod_build_6.10.3-100.fc39.x86_64'
Making all in include
make[4]: Entering directory '/tmp/zfs-build-buildbot-2Wo4V8Y2/BUILD/zfs-kmod-2.2.99/_kmod_build_6.10.3-100.fc39.x86_64/include'
make[4]: Nothing to be done for 'all'.
make[4]: Leaving directory '/tmp/zfs-build-buildbot-2Wo4V8Y2/BUILD/zfs-kmod-2.2.99/_kmod_build_6.10.3-100.fc39.x86_64/include'
Making all in module
make[4]: Entering directory '/tmp/zfs-build-buildbot-2Wo4V8Y2/BUILD/zfs-kmod-2.2.99/_kmod_build_6.10.3-100.fc39.x86_64/module'
mkdir -p os/linux/spl/
mkdir -p avl/ icp/ icp/algs/aes/ icp/algs/blake3/ icp/algs/edonr/ icp/algs/modes/ icp/algs/sha2/ icp/algs/skein/ icp/api/ icp/asm-aarch64/blake3/ icp/asm-aarch64/sha2/ icp/asm-arm/sha2/ icp/asm-ppc64/blake3/ icp/asm-ppc64/sha2/ icp/asm-x86_64/aes/ icp/asm-x86_64/blake3/ icp/asm-x86_64/modes/ icp/asm-x86_64/sha2/ icp/core/ icp/io/ icp/spi/ lua/ lua/setjmp/ nvpair/ os/linux/zfs/ unicode/ zcommon/ zfs/ zstd/ zstd/lib/common/ zstd/lib/compress/ zstd/lib/decompress/
make -C /usr/src/kernels/6.10.3-100.fc39.x86_64  \
	  \
	M="$PWD"  CONFIG_DEBUG_INFO=y CONFIG_ZFS=m modules
make[5]: Entering directory '/usr/src/kernels/6.10.3-100.fc39.x86_64'
make[7]: *** No rule to make target '/tmp/zfs-build-buildbot-2Wo4V8Y2/BUILD/zfs-kmod-2.2.99/_kmod_build_6.10.3-100.fc39.x86_64/module/os/linux/spl/spl-atomic.o', needed by '/tmp/zfs-build-buildbot-2Wo4V8Y2/BUILD/zfs-kmod-2.2.99/_kmod_build_6.10.3-100.fc39.x86_64/module/spl.o'.  Stop.
make[7]: *** Waiting for unfinished jobs....
make[6]: *** [/usr/src/kernels/6.10.3-100.fc39.x86_64/Makefile:1946: /tmp/zfs-build-buildbot-2Wo4V8Y2/BUILD/zfs-kmod-2.2.99/_kmod_build_6.10.3-100.fc39.x86_64/module] Error 2
make[5]: *** [Makefile:252: __sub-make] Error 2
make[5]: Leaving directory '/usr/src/kernels/6.10.3-100.fc39.x86_64'
make[4]: *** [Makefile:56: modules-Linux] Error 2
make[4]: Leaving directory '/tmp/zfs-build-buildbot-2Wo4V8Y2/BUILD/zfs-kmod-2.2.99/_kmod_build_6.10.3-100.fc39.x86_64/module'
make[3]: *** [Makefile:12324: all-recursive] Error 1
make[3]: Leaving directory '/tmp/zfs-build-buildbot-2Wo4V8Y2/BUILD/zfs-kmod-2.2.99/_kmod_build_6.10.3-100.fc39.x86_64'
make[2]: *** [Makefile:4652: all] Error 2
make[2]: Leaving directory '/tmp/zfs-build-buildbot-2Wo4V8Y2/BUILD/zfs-kmod-2.2.99/_kmod_build_6.10.3-100.fc39.x86_64'
error: Bad exit status from /tmp/zfs-build-buildbot-2Wo4V8Y2/TMP/rpm-tmp.egsMTM (%build)

RPM build warnings:
    source_date_epoch_from_changelog set but %changelog is missing

RPM build errors:
    Bad exit status from /tmp/zfs-build-buildbot-2Wo4V8Y2/TMP/rpm-tmp.egsMTM (%build)
make[1]: *** [Makefile:14511: rpm-common] Error 1
make[1]: Leaving directory '/var/lib/buildbot/slaves/zfs/Fedora_39_x86_64__TEST_/build/zfs'
make: *** [Makefile:14445: rpm-kmod] Error 2

https://build.openzfs.org/builders/Fedora%2039%20x86_64%20%28TEST%29/builds/2491/steps/shell_1/logs/make

tonyhutter avatar Aug 14 '24 21:08 tonyhutter

I fixed these things:

  • the stderr messages are sent to the github runner again now
  • I rewrote the run() function completly, the return value of some failed run command is printed and used later
  • I also defined a DEBUG_MAX variable in qemu-7-summary.sh - so we don't output some really big debug file directly to the browser
  • rebased to master

An older testrun with failing Fedora 39+40 is here: https://github.com/mcmilk/zfs/actions/runs/10414909636

TODO:

  • detect kernel hangs and show them explicit
  • maybe restart such vm's and download the logfiles
  • increase DEBUG_MAX to around 400KB

mcmilk avatar Aug 16 '24 05:08 mcmilk

@mcmilk this will take care of the checkstyle issues:

diff --git a/scripts/zfs-tests.sh b/scripts/zfs-tests.sh
index 957e674be..fde2e4acb 100755
--- a/scripts/zfs-tests.sh
+++ b/scripts/zfs-tests.sh
@@ -1,4 +1,4 @@
-#!/usr/bin/env bash
+#!/bin/sh
 # shellcheck disable=SC2154
 #
 # CDDL HEADER START
@@ -215,8 +215,8 @@ find_runfile() {
 #
 split_tags() {
        # Get numerator and denominator
-       NUM=$(echo $TAGS | cut -d/ -f1)
-       DEN=$(echo $TAGS | cut -d/ -f2)
+       NUM=$(echo "$TAGS" | cut -d/ -f1)
+       DEN=$(echo "$TAGS" | cut -d/ -f2)
        # At the point this is called, RUNFILES will contain a comma separated
        # list of full paths to the runfiles, like:
        #
@@ -242,9 +242,12 @@ split_tags() {
        #
        # "append,atime,bootfs,cachefile,checksum,cp_files,deadman,dos_attributes, ..."
 
-       cat ${RUNFILES/,/ } | tr -d [],\' | awk '/tags = /{print $NF}' | sort | \
+       # Change the comma to a space for easy processing
+       _RUNFILES="$(echo """$RUNFILES""" | sed 's/,/ /g')"
+       # shellcheck disable=SC2002,SC2086
+       cat $_RUNFILES | tr -d "[],\'" | awk '/tags = /{print $NF}' | sort | \
                uniq | grep -v functional | \
-               awk -v num=$NUM -v den=$DEN '{ if(NR % den == (num - 1)) {printf "%s,",$0}}' | \
+               awk -v num="$NUM" -v den="$DEN" '{ if(NR % den == (num - 1)) {printf "%s,",$0}}' | \
                sed -E 's/,$//'
 }
 
@@ -568,7 +571,7 @@ RUNFILES=${R#,}
 #
 # "append,atime,bootfs,cachefile,checksum,cp_files,deadman,dos_attributes, ..."
 #
-if echo $TAGS | grep -Eq '^[0-9]+/[0-9]+$' ; then
+if echo "$TAGS" | grep -Eq '^[0-9]+/[0-9]+$' ; then
        TAGS=$(split_tags)
 fi
 

tonyhutter avatar Aug 16 '24 20:08 tonyhutter

I am testing zram disks again, it looks that they will speedup the whole thing a lot.

The checkstyle fixups will get included, thank you.

mcmilk avatar Aug 16 '24 20:08 mcmilk