coreos-assembler icon indicating copy to clipboard operation
coreos-assembler copied to clipboard

Re-running (flake detection) incorrectly reruns tests that did not flake

Open jlebon opened this issue 2 years ago • 1 comments

We've seen this in Jenkins output:

+ cosa kola run --rerun --no-test-exit-error --tag reprovision
kola -p qemu-unpriv --output-dir tmp/kola run --rerun --no-test-exit-error --tag reprovision
������  Skipping kola test pattern "fcos.internet":
  ���� https://github.com/coreos/coreos-assembler/pull/1478
������  Skipping kola test pattern "podman.workflow":
  ���� https://github.com/coreos/coreos-assembler/pull/1478
���� Snoozing kola test pattern "ext.config.toolbox" until Sep 10 2022:
  ���� https://github.com/coreos/fedora-coreos-tracker/issues/1277
���� Snoozing kola test pattern "ext.config.extensions.*" until Sep 10 2022:
  ���� https://github.com/coreos/fedora-coreos-tracker/issues/1278
=== RUN   ext.config.root-reprovision.swap-before-root
=== RUN   ext.config.root-reprovision.luks
=== RUN   coreos.boot-mirror.luks
=== RUN   ext.config.root-reprovision.linear
=== RUN   ext.config.root-reprovision.filesystem-only
=== RUN   ext.config.root-reprovision.raid1
=== RUN   coreos.boot-mirror
--- PASS: ext.config.root-reprovision.swap-before-root (58.36s)
=== RUN   coreos.boot-mirror/sanity-check
=== RUN   coreos.boot-mirror/detach-primary
=== RUN   coreos.boot-mirror/verify-fallback
--- PASS: coreos.boot-mirror (277.71s)
    --- PASS: coreos.boot-mirror/sanity-check (1.48s)
    --- PASS: coreos.boot-mirror/detach-primary (208.45s)
    --- PASS: coreos.boot-mirror/verify-fallback (0.67s)
--- PASS: ext.config.root-reprovision.linear (58.12s)
--- PASS: ext.config.root-reprovision.filesystem-only (60.53s)
=== RUN   coreos.boot-mirror.luks/sanity-check
=== RUN   coreos.boot-mirror.luks/detach-primary
--- FAIL: coreos.boot-mirror.luks (1184.45s)
    --- PASS: coreos.boot-mirror.luks/sanity-check (1.26s)
    --- FAIL: coreos.boot-mirror.luks/detach-primary (600.95s)
            boot-mirror.go:232: Failed to reboot the machine: machine "26795fc0-96af-4cad-b81d-21f2a44974ba" failed to start: ssh journalctl failed: time limit exceeded
        cluster.go:184: "sudo mdadm --export --detail /dev/md/md-root" failed: output , status ssh: handshake failed: read tcp 127.0.0.1:55468->127.0.0.1:38863: read: connection reset by peer
--- PASS: ext.config.root-reprovision.luks (116.84s)
--- PASS: ext.config.root-reprovision.raid1 (71.84s)
FAIL, output in tmp/kola


======== Re-running failed tests (flake detection) ========

������  Skipping kola test pattern "fcos.internet":
  ���� https://github.com/coreos/coreos-assembler/pull/1478
������  Skipping kola test pattern "podman.workflow":
  ���� https://github.com/coreos/coreos-assembler/pull/1478
���� Snoozing kola test pattern "ext.config.toolbox" until Sep 10 2022:
  ���� https://github.com/coreos/fedora-coreos-tracker/issues/1277
���� Snoozing kola test pattern "ext.config.extensions.*" until Sep 10 2022:
  ���� https://github.com/coreos/fedora-coreos-tracker/issues/1278
=== RUN   ext.config.root-reprovision.linear
=== RUN   ext.config.root-reprovision.filesystem-only
=== RUN   ext.config.root-reprovision.raid1
=== RUN   coreos.boot-mirror
=== RUN   ext.config.root-reprovision.swap-before-root
=== RUN   ext.config.root-reprovision.luks
=== RUN   coreos.boot-mirror.luks
--- PASS: ext.config.root-reprovision.linear (58.10s)
--- PASS: ext.config.root-reprovision.swap-before-root (58.45s)
--- PASS: ext.config.root-reprovision.luks (98.78s)
--- PASS: ext.config.root-reprovision.raid1 (76.26s)
=== RUN   coreos.boot-mirror.luks/sanity-check
=== RUN   coreos.boot-mirror.luks/detach-primary
=== RUN   coreos.boot-mirror.luks/verify-fallback
--- PASS: coreos.boot-mirror.luks (687.21s)
    --- PASS: coreos.boot-mirror.luks/sanity-check (1.30s)
    --- PASS: coreos.boot-mirror.luks/detach-primary (252.47s)
    --- PASS: coreos.boot-mirror.luks/verify-fallback (0.72s)
=== RUN   coreos.boot-mirror/sanity-check
=== RUN   coreos.boot-mirror/detach-primary

We should've only rerun coreos.boot-mirror.luks.

jlebon avatar Aug 23 '22 13:08 jlebon

I think the problem is introduced with the addition of the --tag reprovision argument. Otherwise I think the code is acting appropriately.

dustymabe avatar Aug 23 '22 13:08 dustymabe