zfs icon indicating copy to clipboard operation
zfs copied to clipboard

ZTS: Add timeout to cp_stress

Open tonyhutter opened this issue 1 year ago • 4 comments

Motivation and Context

cp_stress ZTS test can timeout after 10min on underpowered test systems

Description

cp_stress is getting killed on the new QEMU-based github runners we're developing. The problem is that the QEMU-based testers are so under-powered that the test is taking longer than the 10min maximum that ZTS enforces. Instead, enforce an inter-test-cp_stress timeout so the entire test doesn't get killed.

How Has This Been Tested?

Ran on FreeBSD 13 and Ubuntu 24.

Types of changes

  • [x] Bug fix (non-breaking change which fixes an issue)
  • [ ] New feature (non-breaking change which adds functionality)
  • [ ] Performance enhancement (non-breaking change which improves efficiency)
  • [ ] Code cleanup (non-breaking change which makes code smaller or more readable)
  • [ ] Breaking change (fix or feature that would cause existing functionality to change)
  • [ ] Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • [ ] Documentation (a change to man pages or other documentation)

Checklist:

  • [ ] My code follows the OpenZFS code style requirements.
  • [ ] I have updated the documentation accordingly.
  • [ ] I have read the contributing document.
  • [ ] I have added tests to cover my changes.
  • [ ] I have run the ZFS Test Suite with this change applied.
  • [ ] All commit messages are properly formatted and contain Signed-off-by.

tonyhutter avatar Jun 24 '24 18:06 tonyhutter

Looks right for what it is, but I wonder: what kind of "underpowered" are we talking here? Because seekflood is more likely to hit things if there are more CPUs. So if there's only one or two here, is it even worth running this?

A little background: I'm currently helping test https://github.com/openzfs/zfs/pull/15838. The idea there is to convert our buildbot builders to github runners. Or more specifically, run ZTS within VMs on the github runners (using the runner as a VM host). We need to use VMs, since github runners don't natively support all the OSs we test on (like Fedora, Almalinux, FreeBSD, etc). In order to speed up the testing and not run into the 4-hour testing timeout, we're chunking up the the tests into thirds and running it on three VMs running in parallel. That unfortunately means each each VM is going over-provisioned and run slower. That's fine for most ZTS tests since they aren't very CPU bound, but there are some outliers like cp_stress that do run too long and get killed.

So yes, we ideally would want faster runners with more CPUs since they're going to expose race conditions and locking bugs like this quicker. But we also have to make due with what's available.

tonyhutter avatar Jun 25 '24 00:06 tonyhutter

Oh I'm sorry, my question wasn't clear (also it was more of a drive-by mumble than a really big-brained thought).

I more meant, if we know this test is running in an environment where it's unlikely to actually bump into the thing it's checking for (because it isn't parallel enough), should we disable the test in that environment. Like, should it SKIP if it has fewer than N cores or something?

I suppose even "unlikely" is better than "never" though!

robn avatar Jun 25 '24 00:06 robn

I think it's still a useful test to run no matter what the hardware is. This popped up today: https://github.com/openzfs/zfs/issues/16297

tonyhutter avatar Jun 25 '24 00:06 tonyhutter

Oof, hard to argue with results. Also yikes, I'll check it out! :sweat_smile:

robn avatar Jun 25 '24 01:06 robn

I created another pull request for this problem here: #16369

This one just takes down the RUNS to the limit of the FreeBSD testings, and then the test work fine on Linux also. The cp_stress test needs around 2 minutes on FreeBSD and Linux then...

mcmilk avatar Jul 20 '24 07:07 mcmilk

@tonyhutter - can be closed now :+1:

mcmilk avatar Jul 22 '24 21:07 mcmilk

Closing in favor of https://github.com/openzfs/zfs/pull/16369

tonyhutter avatar Jul 25 '24 23:07 tonyhutter