coreos-assembler icon indicating copy to clipboard operation
coreos-assembler copied to clipboard

mantle/kola: copy basic tests into formal kola workflow

Open prestist opened this issue 1 year ago • 19 comments

cmd-kola has non-conformed basic kola tests. These tests do not benefit from the supporting functionality around kola test's such as denylisting or other functionality. reduce cmd-kola and conform the basic kola tests.

fixes https://github.com/coreos/coreos-assembler/issues/3418

This is a small rework on an already existing; but stale pr. Cherry-picked and made a few touches.

prestist avatar Nov 07 '23 20:11 prestist

hmm, I might have missed it but I did not see what triggers the CI to run kola -p qemu --build latest run --rerun --allow-rerun-success=tags=needs-internet --on-warn-failure-exit-77 --arch=x86_64 --basic-qemu-scenarios

If someone could point me to it I can make a pr and remove it.

prestist avatar Nov 07 '23 21:11 prestist

hmm, I might have missed it but I did not see what triggers the CI to run kola -p qemu --build latest run --rerun --allow-rerun-success=tags=needs-internet --on-warn-failure-exit-77 --arch=x86_64 --basic-qemu-scenarios

If someone could point me to it I can make a pr and remove it.

We also need to touch coreos-ci-lib

ravanelli avatar Nov 08 '23 15:11 ravanelli

hmm, I might have missed it but I did not see what triggers the CI to run kola -p qemu --build latest run --rerun --allow-rerun-success=tags=needs-internet --on-warn-failure-exit-77 --arch=x86_64 --basic-qemu-scenarios If someone could point me to it I can make a pr and remove it.

We also need to touch coreos-ci-lib

awesome; made a pr. thank you

prestist avatar Nov 08 '23 20:11 prestist

Trying to summarize what we'll run:

x86_64 aarch64 ppc64le s390x
boot type bios/uefi/uefi-secure bios bios bios
nvme yes no no no
Number of combination 6 1 1 1

So the question is: Should we really run all 6 combination for the test matrix on x86_64 or just a subset of those for nvme? For now we were only running basic (nvme), uefi, uefi-secure as Michael said.

Shouldn't we run uefi & uefi-secure boot on aarch64 as well? Feels like that should be working.

EDIT: Re-reading the PR, this runs all 6 ones for x86_64 so should be OK.

travier avatar Nov 09 '23 14:11 travier

Re-reading the PR, this runs all 6 ones for x86_64 so should be OK

I'm good with that. It makes sense that we would test bios, uefi, and uefi-secure with both nvme and non-nvme conditions on x86_64.

So we just need to make sure that the nvme and uefi/uefi-secure tests dont run on other arches.

Shouldn't we run uefi & uefi-secure boot on aarch64 as well? Feels like that should be working.

Unless we want these ones to run on aarch64.

marmijo avatar Nov 09 '23 17:11 marmijo

So we just need to make sure that the nvme and uefi/uefi-secure tests dont run on other arches.

If thats the case It sounds like I need to update the field Architectures to only run x86 on all but bios?

prestist avatar Nov 09 '23 18:11 prestist

If thats the case It sounds like I need to update the field Architectures to only run x86 on all but bios?

Yes, I believe that should do it!

marmijo avatar Nov 09 '23 19:11 marmijo

So overall I think there are at least two main ways I see we can approach this:

1. Have knobs as part of `register.Test` to say e.g. which firmware to use for the test.

2. Use `ClusterSize: 0` and manually bring up the VMs ourselves. This is done in a bunch of QEMU tests already where we need more control over the VM settings. (Do a `git grep 'ClusterSize: \+0'` to see them.)

Leaning towards the latter to avoid having to add a bunch more knobs in register.Test. You can have a single function that sets up the tests and takes arguments for the variable knobs and each test calls that one function with the different combinations we want (see the luks.* tests for an example of this pattern).

Re. the combinations, my vote is to match the current coverage and then we can extend it in the future. But not strongly against either!

Ok thank you for the pointer. I am looking into doing this.

prestist avatar Nov 14 '23 21:11 prestist

@jlebon thank you for your review, and thats a fair question, I did not check it when I pushed. It was before; but I need to double check.

Thank you for your time and feedback, and I will try and get to this in the next week or so.

prestist avatar Feb 22 '24 19:02 prestist

RHCOS CI will be failing for this PR because we remove the option it uses so this is expected.

travier avatar Feb 26 '24 12:02 travier

Ok locally all tests are passing :)

prestist avatar Feb 27 '24 21:02 prestist

This looks great!

Thank you for all of your help and back and forth on this review, its been a learning experience for me, and Its been interesting to learn more about kola.

Another safer option is to change coreos-ci-lib to check if kola list has e.g. basic.nvme (or that cosa kola --help doesn't have --basic-qemu-scenarios). This is a signal that it's working with a cosa that has this PR. If yes, then use the new logic proposed in coreos/coreos-ci-lib#154. If no, then use the old logic. Then we can merge this PR, verify it works well, etc... and work on backporting it to older branches at our own pace. Once all backports are done, we clean up the old logic entirely from coreos-ci-lib.

Alright lets def opt for this, I can see about making changes to the coreos-ci-lib pr here shortly.

prestist avatar Feb 29 '24 16:02 prestist

Another PR to prepare is the update for the RHCOS CI script but it should be a fairly small change.

travier avatar Mar 07 '24 13:03 travier

https://github.com/openshift/os/blob/master/ci/prow-entrypoint.sh#L126

travier avatar Mar 07 '24 13:03 travier

Since https://github.com/coreos/coreos-ci-lib/pull/154 just landed I should be able to re-run these test and get it passing.

prestist avatar Mar 11 '24 20:03 prestist

@travier Thank you pr here => https://github.com/openshift/os/pull/1458

prestist avatar Mar 11 '24 21:03 prestist

/retest

prestist avatar Mar 12 '24 19:03 prestist

and work on backporting it to older branches at our own pace. Once all backports are done, we clean up the old logic entirely from coreos-ci-lib.

I talked to Steven about this yesterday. If the backports apply relatively easily, let's do that. Otherwise, it's fine also to carry that coreos-ci-lib logic for now until we fully drop older streams that are not worth backporting to.

jlebon avatar Mar 15 '24 18:03 jlebon

/retest

jlebon avatar Mar 20 '24 16:03 jlebon

Pushed a trivial commit to fix the Prow CI issue.

/approve /lgtm

jlebon avatar Mar 20 '24 20:03 jlebon

/retest

prestist avatar Mar 26 '24 13:03 prestist

Looked like the other commit might not had made it to the pipeline before the retest had started as we got some funk. Retesting to see.

prestist avatar Mar 26 '24 13:03 prestist

@jlebon thank you!

prestist avatar Mar 26 '24 13:03 prestist

@jlebon why would prow have this error --basic-qemu-scenarios, it seems odd as our changes have landed in https://github.com/coreos/coreos-ci-lib/pull/154

prestist avatar Mar 26 '24 14:03 prestist

/test ci/prow/images

travier avatar Apr 10 '24 13:04 travier

@travier: The specified target(s) for /test were not found. The following commands are available to trigger required jobs:

  • /test images
  • /test rhcos

Use /test all to run all jobs.

In response to this:

/test ci/prow/images

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci[bot] avatar Apr 10 '24 13:04 openshift-ci[bot]

/test images

travier avatar Apr 10 '24 13:04 travier

Needs https://github.com/coreos/coreos-assembler/pull/3775

travier avatar Apr 10 '24 13:04 travier

/retest

travier avatar Apr 10 '24 13:04 travier

I think the bug that was breaking CI here should be fixed by https://github.com/openshift/os/pull/1478. We just need to get https://github.com/coreos/coreos-assembler/pull/3775 in, and then this should be unblocked.

jlebon avatar Apr 10 '24 14:04 jlebon