os-autoinst-distri-opensuse icon indicating copy to clipboard operation
os-autoinst-distri-opensuse copied to clipboard

Add wmp_simple test to maintenance tests for hana

Open frankenmichl opened this issue 1 year ago • 18 comments

We should run wmp_simple test for maintenance tests to verify it is working as intended

frankenmichl avatar May 23 '23 11:05 frankenmichl

Verification Runs:

  • 15-SP2: https://openqa.suse.de/tests/overview?distri=sle&version=15-SP2&build=VR4PR17134&groupid=311
  • 15-SP3: https://openqa.suse.de/tests/overview?groupid=372&build=VR4PR17134&distri=sle&version=15-SP3

Could not find 15-SP4 jobs with still valid repos to clone. We could clone the jobs from the QU, but they use a different schedule: schedule/sles4sap/hana/hana_cluster_node.yaml. Would it make sense to also add the test for Quarterly Releases and for Product Build Validation?

alvarocarvajald avatar May 23 '23 11:05 alvarocarvajald

@frankenmichl VRs in 15-SP3 worked, but in 15-SP2 there seems to be an issue with one of the repositories the test attempts to add:

https://openqa.suse.de/tests/11187675#step/wmp_simple/54

Also, is it expected that it tries to add a repo for 15-SP3 for a test running in 15-SP2?

alvarocarvajald avatar May 24 '23 13:05 alvarocarvajald

I will look into this issue. I think we should use the right repo for the version of SLE.

I wonder how this worked for 15-SP3, as there is only a repository for 15.4 or 15.5. The package builds for 15-SP2 and 15-SP3 - I think we can add it to QA-Head repo and use it from there.

frankenmichl avatar May 24 '23 14:05 frankenmichl

I wonder how this worked for 15-SP3, as there is only a repository for 15.4 or 15.5. The package builds for 15-SP2 and 15-SP3 - I think we can add it to QA-Head repo and use it from there.

Just copypac or link on IBS from SUSE:Factory:Head/stress-ng to QA:Head

czerw avatar May 26 '23 06:05 czerw

Now that we have merged https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/17156, we could return to this - we don't need custom package anymore and just use stress-ng from packagehub now.

frankenmichl avatar Jun 13 '23 11:06 frankenmichl

@alvarocarvajald would you mind having another look please?

frankenmichl avatar Jul 10 '23 12:07 frankenmichl

Scheduled verification runs in most supported SPs:

15-SP5: https://openqa.suse.de/tests/overview?version=15-SP5&build=VR4PR17134&distri=sle&groupid=494 15-SP4: https://openqa.suse.de/tests/overview?groupid=441&build=VR4PR17134&distri=sle&version=15-SP4 15-SP3: https://openqa.suse.de/tests/overview?groupid=372&build=VR4PR17134&distri=sle&version=15-SP3 15-SP2: https://openqa.suse.de/tests/overview?distri=sle&version=15-SP2&build=VR4PR17134&groupid=311 12-SP5: https://openqa.suse.de/tests/overview?groupid=302&distri=sle&build=VR4PR17134&version=12-SP5

I suppose if VRs pass, then this can be merged.

Cc: @bs-suse @emiura @lpalovsky

alvarocarvajald avatar Aug 04 '23 12:08 alvarocarvajald

Results from above:

15-SP5: support server job finished due to MAX_JOB_TIME. No idea what the problem was, and since it has been 16 days, a simple restart of the jobs will fail as the incident repo is no longer available. I'll clone new jobs. 15-SP4: tests passed. 15-SP3: there was a syntax error in kernel/wmp_simple: https://openqa.suse.de/tests/11741393#step/wmp_simple/27. Could be a transient issue. I'll also clone new jobs. 15-SP2: failed in ha/ha_cluster_init: https://openqa.suse.de/tests/11741527#step/ha_cluster_init/16. This is unrelated to the PR. I'll also clone new jobs. 12-SP5: failed in ha/ha_cluster_join: https://openqa.suse.de/tests/11741530#step/ha_cluster_join/16. This is unrelated to the PR. I'll also clone new jobs.

We've been having a lot of issues with Multi Machine jobs' networking recently in osd. It's possible the failures in 15-SP5, 15-SP2 and 12-SP5 is related to this. Let's see how the new jobs behave.

Can you take a look at the failure in kernel/wmp_simple in case it was not a transient error?

alvarocarvajald avatar Aug 22 '23 16:08 alvarocarvajald

Now that the MM & NFS issues seem solved, starting new verification runs:

~15-SP5: https://openqa.suse.de/tests/overview?distri=sle&groupid=494&version=15-SP5&build=VR4PR17134~ ~15-SP4: https://openqa.suse.de/tests/overview?build=VR4PR17134&version=15-SP4&groupid=441&distri=sle~ ~15-SP3: https://openqa.suse.de/tests/overview?build=VR4PR17134&version=15-SP3&groupid=372&distri=sle~ ~15-SP2: https://openqa.suse.de/tests/overview?build=VR4PR17134&version=15-SP2&groupid=311&distri=sle~ ~15-SP1: https://openqa.suse.de/tests/overview?version=15-SP1&distri=sle&groupid=300&build=VR4PR17134~ ~12-SP5: https://openqa.suse.de/tests/overview?build=VR4PR17134&distri=sle&groupid=302&version=12-SP5~

Removed those jobs as they failed on an unrelated step. Please rebase, and I will try again.

alvarocarvajald avatar Sep 22 '23 15:09 alvarocarvajald

@alvarocarvajald @frankenmichl could you please check status of this PR?

czerw avatar Oct 17 '23 08:10 czerw

I just rebased again

frankenmichl avatar Oct 17 '23 09:10 frankenmichl

@alvarocarvajald @frankenmichl could you please check status of this PR?

I'll schedule VRs again to check.

alvarocarvajald avatar Oct 17 '23 13:10 alvarocarvajald

  • ~12-SP5: https://openqa.suse.de/tests/overview?distri=sle&build=VR4PR17134&version=12-SP5~
  • 15-SP1: https://openqa.suse.de/tests/overview?distri=sle&build=VR4PR17134&version=15-SP1
  • 15-SP2: https://openqa.suse.de/tests/overview?version=15-SP2&distri=sle&build=VR4PR17134
  • 15-SP3: https://openqa.suse.de/tests/overview?version=15-SP3&build=VR4PR17134&distri=sle
  • 15-SP4: https://openqa.suse.de/tests/overview?distri=sle&build=VR4PR17134&version=15-SP4
  • 15-SP5: https://openqa.suse.de/tests/overview?version=15-SP5&distri=sle&build=VR4PR17134

alvarocarvajald avatar Oct 17 '23 13:10 alvarocarvajald

  • 15-SP5: https://openqa.suse.de/tests/overview?version=15-SP5&distri=sle&build=VR4PR17134

Most of the VRs are either running, or I have restarted them (failed due to pp#135980 which is unrelated to this PR), but 15-SP5 jobs have finished already and they failed in the wmp_simple module in both nodes.

@frankenmichl can you take a look: https://openqa.suse.de/tests/12562920#step/wmp_simple/21 & https://openqa.suse.de/tests/12562918#step/wmp_simple/21

alvarocarvajald avatar Oct 18 '23 12:10 alvarocarvajald

More results:

  • wmp_simple failed in 15-SP1. node 1 & node 2
  • Passed on 15-SP2 and 15-SP4
  • There were failures not related to this PR in 12-SP5 and 15-SP3. I restarted those tests.

alvarocarvajald avatar Oct 19 '23 10:10 alvarocarvajald

Restarted jobs in 12-SP5 were failing as the incident repos are gone, so I scheduled new ones:

https://openqa.suse.de/tests/overview?build=VR4PR17134&version=12-SP5&distri=sle

alvarocarvajald avatar Oct 20 '23 16:10 alvarocarvajald

Failure in 15-SP3: https://openqa.suse.de/tests/12601620#step/wmp_simple/27

alvarocarvajald avatar Oct 20 '23 19:10 alvarocarvajald

Giving it another go to Verification Runs:

12-SP5: https://openqa.suse.de/tests/overview?build=VR4PR17134&distri=sle&version=12-SP5 15-SP1: https://openqa.suse.de/tests/overview?distri=sle&build=VR4PR17134&version=15-SP1 15-SP2: https://openqa.suse.de/tests/overview?distri=sle&build=VR4PR17134&version=15-SP2 15-SP3: https://openqa.suse.de/tests/overview?version=15-SP3&distri=sle&build=VR4PR17134 15-SP4: https://openqa.suse.de/tests/overview?version=15-SP4&distri=sle&build=VR4PR17134 15-SP5: https://openqa.suse.de/tests/overview?version=15-SP5&distri=sle&build=VR4PR17134

alvarocarvajald avatar Dec 08 '23 16:12 alvarocarvajald

Running new verification runs:

12-SP5: https://openqa.suse.de/tests/overview?version=12-SP5&build=VR4PR17134&distri=sle :green_circle: 15-SP2: https://openqa.suse.de/tests/overview?distri=sle&version=15-SP2&build=VR4PR17134 :green_circle: 15-SP3: https://openqa.suse.de/tests/overview?distri=sle&version=15-SP3&build=VR4PR17134 15-SP4: https://openqa.suse.de/tests/overview?build=VR4PR17134&version=15-SP4&distri=sle :red_circle: 15-SP5: https://openqa.suse.de/tests/overview?version=15-SP5&build=VR4PR17134&distri=sle :red_circle:

alvarocarvajald avatar Apr 24 '24 15:04 alvarocarvajald

@czerw FYI, tests in 15-SP4 and 15-SP5 failed on kernel/wmp_simple. Both are failing in a systemd-cgls -u SAP.slice | grep 'wmp-.*.scope' | cut -c 3- command .... however perhaps an important difference: 15-SP4 test is configuring sapwmp while 15-SP5 isn't. Failure is the same in both though.

I restarted 15-SP3 jobs.

alvarocarvajald avatar Apr 26 '24 08:04 alvarocarvajald

@czerw FYI, tests in 15-SP4 and 15-SP5 failed on kernel/wmp_simple. Both are failing in a systemd-cgls -u SAP.slice | grep 'wmp-.*.scope' | cut -c 3- command .... however perhaps an important difference: 15-SP4 test is configuring sapwmp while 15-SP5 isn't. Failure is the same in both though.

I restarted 15-SP3 jobs.

@frankenmichl @schlad please check above failures

czerw avatar Apr 26 '24 08:04 czerw

Shouldnt 15-SP5 have dropped WMP? So maybe we should not even try to run it there in this case.

frankenmichl avatar Apr 26 '24 09:04 frankenmichl

Shouldnt 15-SP5 have dropped WMP? So maybe we should not even try to run it there in this case.

Yes, but what about fail on 15-SP4?

czerw avatar Apr 30 '24 06:04 czerw

After some minor fixes we now have successful VRs: 12-SP5: https://openqa.suse.de/tests/14577457 15-SP2: https://openqa.suse.de/tests/14577454 15-SP3: https://openqa.suse.de/tests/14579211 15-SP4: https://openqa.suse.de/tests/14577445 15-SP5: https://openqa.suse.de/tests/14577440

frankenmichl avatar Jun 12 '24 12:06 frankenmichl

Oh I just noticed I have some small fix missing :( will post new VRs in a minute

frankenmichl avatar Jun 12 '24 13:06 frankenmichl

There was a wrong condition leading to stress 0 bytes of memory. Updated the Verification runs.

12-SP5: https://openqa.suse.de/tests/14594016 15-SP2:https://openqa.suse.de/tests/14594014 15-SP3: https://openqa.suse.de/tests/14594010 15-SP4: https://openqa.suse.de/tests/14594006 15-SP5: https://openqa.suse.de/tests/14593999

frankenmichl avatar Jun 12 '24 13:06 frankenmichl