avocado nrunner seems to not cleanup properly on timeout

Describe the bug nrunner seems to not call tearDown() when a testcase times out

Steps to reproduce I have testcases which need some background processes. They are started by the testcase like this:

self.dnsmasq = avocado.utils.process.SubProcess(cmdline, ...)

And are stopped this way:

    def tearDown(self):
        self.log.info('### stop processes')
        if self.dnsmasq is not None:
            self.dnsmasq.stop()

Expected behavior When the test is done tearDown() is called for cleanup.

Current behavior When the test case times out (result 'INTERRUPTED') tearDown() is apparently not called, so I sometimes have leftover processes hanging around after the test suite is done.

System information (please complete the following information):

Fedora 36
Avocado 97.0
Installed via pip3 --user

Additional information Used to work fine with the old runner, so I switched back to that, hoping nrunner gets fixed before the old runner is removed.

Which worked for a number of other nrunner issues, but not this one ...

test cases repo: https://gitlab.com/kraxel/edk2-tests

Jun 21 '22 09:06 kraxel

@kraxel thanks for reporting this issue.

Interrupting the test and then moving the execution to the same (test) instance tearDown() method would be very hard to implement. I can see two ways around this:

Implement specific resource trackers in Avocado. Those can include processes and files in known (or configurable) locations that Avocado itself will clean up post test execution (including interruption).
Evolve the other better isolated spawners such as podman. The way the podman spawner behave is that the container will be destroyed upon test interruption, so all children processes would be automatically removed too.

Bear in mind that option two currently has a number of shortcomings, but we intend to make it no different from a local test execution.

Would you, as a developer/user, fancy one over the other?

Jun 21 '22 15:06 clebergnu

option two shortcoming #1 -- seems to not be documented very well. Is there something more useful the object reference (which doesn't tell me how I can use it) ? https://avocado-framework.readthedocs.io/en/latest/api/plugins/avocado.plugins.spawners.html#module-avocado.plugins.spawners.podman

Without that it's hard to judge. From a design point of view running inside a container and leave the cleanup to the container runtime makes sense to me.

What are the known shortcomings?

How does the podman spawner handle networking? I have some tests which create an isolated network namespace, then hook up dhcp, dns, tftp, http and qemu to that namespace for network boot testing. Can that be done with the podman spawner? Or maybe I don't have to create a network namespace myself then because podman creates one for the container anyway?

Jun 22 '22 08:06 kraxel

option two shortcoming #1 -- seems to not be documented very well. Is there something more useful the object reference (which doesn't tell me how I can use it) ? https://avocado-framework.readthedocs.io/en/latest/api/plugins/avocado.plugins.spawners.html#module-avocado.plugins.spawners.podman

True. We do expect to make this more visible as it grows in capabilities. Expect an update to the documentation shortly.

Anyway, right now the podman spawner is only visible by doing:

$ avocado plugins
...
Plugins that spawn tasks and know about their status (spawner): 
podman  Podman (container) based spawner
process Process based spawner
...

And the usage message is available when running avocado run --help:

...
  --nrunner-spawner SPAWNER
                        Spawn tasks in a specific spawner. Available spawners:
                        'process' and 'podman'
...
podman spawner specific options:
  --spawner-podman-bin PODMAN_BIN
                        Path to the podman binary
  --spawner-podman-image CONTAINER_IMAGE
                        Image name to use when creating the container. The
                        first default choice is a container image matching the
                        current OS. If unable to detect, default becomes the
                        latest Fedora release. Default on this system:
                        fedora:36
  --spawner-podman-avocado-egg AVOCADO_EGG
                        Avocado egg path to be used during initial bootstrap
                        of avocado inside the isolated environment. By
                        default, Avocado will try to download (or get from
                        cache) an egg from its repository.

Without that it's hard to judge. From a design point of view running inside a container and leave the cleanup to the container runtime makes sense to me.

What are the known shortcomings?

There may be many shortcomings, some inherent from the container technology itself, and some from Avocado's use of the technology. But, there are not many known at this point due to its lack of widespread use. I can think of the fact that the container environment needs to have a functional Python stack at this moment (but nothing else).

Its strengths when paired with Avocado, on the other hand, include:

The automatic provisioning (in a non-intrusive way) of the Avocado runtime in the container (for all currently supported Python versions)
An automatic read-only volume mounting of the base location of your test files into the container (so that test files and associated data file are found)

How does the podman spawner handle networking? I have some tests which create an isolated network namespace, then hook up dhcp, dns, tftp, http and qemu to that namespace for network boot testing. Can that be done with the podman spawner? Or maybe I don't have to create a network namespace myself then because podman creates one for the container anyway?

The communication between Avocado running the job (and spawning tests in a container) and the actual "avocado-runner-*" running the test inside the container are not dependent on networking proper (that's the short answer). So it should be possible to use any of the podman networking capabilities, including its own isolated network namespace indeed. Avocado currently doesn't allow control over the networking options for the podman container it creates, but that's an easy fix.

In short, Avocado is betting that container based workloads make a lot of sense for functional/integration testing, so a lot of new features are being developed with this mindset. One example of a very cool feature under development is this.

Long story short, all the dependencies in a test will be attempted to be fulfilled transparently in the environment they are running (including podman containers), and those environments will be cached for future use. If you need to install packages, download files (and more to come) for your test to run successfully, you just need to describe them in the test. Example:

from avocado import Test
from avocado.utils import process

class TFTP(Test):
    """
    avocado: dependency={"type": "package", "name": "tftp-server"}
    """
    def test(self):
        self.log.info(process.run("in.tftpd --version"))

On first run, Avocado will prepare the container images. On subsequent runs, Avocado will just reuse them.

Let me know if that sounds valuable and an extra incentive to investigate the use option two with the edk2 tests.

Jun 24 '22 21:06 clebergnu

In order to move this issue forward, I'll attempt to run the e2dk tests (and examples given in the issue description) with the podman spawner.

Jan 17 '23 16:01 clebergnu

You might want revert 9c79f0c5ddacc172bca8b1d86c384614672a5423 then. That wraps qemu into a 'timeout' call so in case it hangs for whatever reason it gets killed with SIGALARM and the cleanup functions get a chance to run.

Jan 18 '23 09:01 kraxel

avocado avocado copied to clipboard

nrunner seems to not cleanup properly on timeout

avocado
avocado copied to clipboard