origin icon indicating copy to clipboard operation
origin copied to clipboard

NO-JIRA: Allow multiple attempts in egress firewall test

Open ldoktor opened this issue 9 months ago • 29 comments

this test is failing when using kata-containers, which might be related to longer startup times of kata-containers:

curl: (28) Connection timed out after 1001 milliseconds

let's use the "--retry" feature of curl. This should not affect the successful tests as they should return immediately, while it might prolong the failing tests from 3s to 30s. With kata we need about 6-12s so 30s should be safe for us.

ldoktor avatar Mar 21 '25 07:03 ldoktor

@knobunc @trozet hello folks, this is my first contribution in openshift/origin, should I keep rebasing this PR or should I wait for a review first?

ldoktor avatar Mar 26 '25 12:03 ldoktor

CC: @neisw @bertinatto could you please take a look at this? Should I rebase or simply wait for a review?

ldoktor avatar Apr 08 '25 09:04 ldoktor

Hi @ldoktor, go ahead and rebase. Typically you would get a review from the team responsible for the test (looks like sdn team). Also if you know of a job that typically shows this failure it would be good to run that job for validation along with the regular presubmits.

neisw avatar Apr 08 '25 15:04 neisw

Hi @ldoktor, go ahead and rebase. Typically you would get a review from the team responsible for the test (looks like sdn team). Also if you know of a job that typically shows this failure it would be good to run that job for validation along with the regular presubmits.

Thank you, I wasn't sure. It's rebased, the failed job log is here: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-kata-containers-kata-containers-main-e2e-tests/1903703902650372096 and I manually tested my code with extra debug, usually it's 3-4 retries before it's ready and the test is passing with this exact commit as well.

ldoktor avatar Apr 09 '25 13:04 ldoktor

/payload-job periodic-ci-kata-containers-kata-containers-main-e2e-tests

neisw avatar Apr 09 '25 18:04 neisw

@neisw: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

openshift-ci[bot] avatar Apr 09 '25 18:04 openshift-ci[bot]

No luck running /payload-job periodic-ci-kata-containers-kata-containers-main-e2e-tests This is a small enough change and the presubmits look fine I can go ahead and tag it. Do you have a jira for this work?

neisw avatar Apr 09 '25 18:04 neisw

No luck running /payload-job periodic-ci-kata-containers-kata-containers-main-e2e-tests This is a small enough change and the presubmits look fine I can go ahead and tag it. Do you have a jira for this work?

This is related to upstream testing so we don't have any jira for it.

ldoktor avatar Apr 10 '25 04:04 ldoktor

/lgtm

you probably want to retitle with NO-JIRA: then

neisw avatar Apr 11 '25 13:04 neisw

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ldoktor, neisw

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar Apr 11 '25 13:04 openshift-ci[bot]

@ldoktor: This pull request explicitly references no jira issue.

In response to this:

this test is failing when using kata-containers, which might be related to longer startup times of kata-containers:

curl: (28) Connection timed out after 1001 milliseconds

let's use the "--retry" feature of curl. This should not affect the successful tests as they should return immediately, while it might prolong the failing tests from 3s to 30s. With kata we need about 6-12s so 30s should be safe for us.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Apr 14 '25 05:04 openshift-ci-robot

/retest-required

Remaining retests: 0 against base HEAD ed54e26255946eadff39f970e3c3e74e7d2923eb and 2 for PR HEAD c0e7f893acc8818e0d7d5b8b724067c2fdf2d4d5 in total

openshift-ci-robot avatar Apr 14 '25 06:04 openshift-ci-robot

/retest-required

Remaining retests: 0 against base HEAD 787ed136ce39f301318e7ace9fc7b4ad9782ca53 and 2 for PR HEAD c0e7f893acc8818e0d7d5b8b724067c2fdf2d4d5 in total

openshift-ci-robot avatar Apr 14 '25 12:04 openshift-ci-robot

/retest-required

Remaining retests: 0 against base HEAD 0c7519e3b3ceb1fbd62209e059ff4fb48646c5b0 and 1 for PR HEAD c0e7f893acc8818e0d7d5b8b724067c2fdf2d4d5 in total

openshift-ci-robot avatar Apr 14 '25 17:04 openshift-ci-robot

/retest-required

Remaining retests: 0 against base HEAD 0c7519e3b3ceb1fbd62209e059ff4fb48646c5b0 and 2 for PR HEAD c0e7f893acc8818e0d7d5b8b724067c2fdf2d4d5 in total

openshift-ci-robot avatar Apr 15 '25 14:04 openshift-ci-robot

/retest-required

Remaining retests: 0 against base HEAD 0c7519e3b3ceb1fbd62209e059ff4fb48646c5b0 and 2 for PR HEAD c0e7f893acc8818e0d7d5b8b724067c2fdf2d4d5 in total

openshift-ci-robot avatar Apr 15 '25 17:04 openshift-ci-robot

/retest-required

Remaining retests: 0 against base HEAD 0c7519e3b3ceb1fbd62209e059ff4fb48646c5b0 and 2 for PR HEAD c0e7f893acc8818e0d7d5b8b724067c2fdf2d4d5 in total

openshift-ci-robot avatar Apr 16 '25 01:04 openshift-ci-robot

/retest-required

Remaining retests: 0 against base HEAD f5a8115b477f87c2244397b2aa29fc5013e376e9 and 2 for PR HEAD c0e7f893acc8818e0d7d5b8b724067c2fdf2d4d5 in total

openshift-ci-robot avatar Apr 16 '25 08:04 openshift-ci-robot

/retest-required

Remaining retests: 0 against base HEAD f5a8115b477f87c2244397b2aa29fc5013e376e9 and 2 for PR HEAD c0e7f893acc8818e0d7d5b8b724067c2fdf2d4d5 in total

openshift-ci-robot avatar Apr 16 '25 15:04 openshift-ci-robot

/retest-required

Remaining retests: 0 against base HEAD f5a8115b477f87c2244397b2aa29fc5013e376e9 and 2 for PR HEAD c0e7f893acc8818e0d7d5b8b724067c2fdf2d4d5 in total

openshift-ci-robot avatar Apr 17 '25 19:04 openshift-ci-robot

/retest-required

Remaining retests: 0 against base HEAD f5a8115b477f87c2244397b2aa29fc5013e376e9 and 2 for PR HEAD c0e7f893acc8818e0d7d5b8b724067c2fdf2d4d5 in total

openshift-ci-robot avatar Apr 17 '25 22:04 openshift-ci-robot

/retest-required

Remaining retests: 0 against base HEAD 11058f61841d55772243db759a748dd7fc84703d and 1 for PR HEAD c0e7f893acc8818e0d7d5b8b724067c2fdf2d4d5 in total

openshift-ci-robot avatar Apr 18 '25 01:04 openshift-ci-robot

/retest-required

Remaining retests: 0 against base HEAD 11058f61841d55772243db759a748dd7fc84703d and 2 for PR HEAD c0e7f893acc8818e0d7d5b8b724067c2fdf2d4d5 in total

openshift-ci-robot avatar Apr 18 '25 08:04 openshift-ci-robot

/retest-required

Remaining retests: 0 against base HEAD dc67a3ac10b8c1c641227e6f5caa4aa9b6997404 and 1 for PR HEAD c0e7f893acc8818e0d7d5b8b724067c2fdf2d4d5 in total

openshift-ci-robot avatar Apr 18 '25 13:04 openshift-ci-robot

/retest-required

Remaining retests: 0 against base HEAD bff03608d61ffc3adbb2520684ec1f66d1c1dc39 and 0 for PR HEAD c0e7f893acc8818e0d7d5b8b724067c2fdf2d4d5 in total

openshift-ci-robot avatar Apr 21 '25 14:04 openshift-ci-robot

/hold

Revision c0e7f893acc8818e0d7d5b8b724067c2fdf2d4d5 was retested 3 times: holding

openshift-ci-robot avatar Apr 21 '25 18:04 openshift-ci-robot

Hello folks, the only required test https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/29615/pull-ci-openshift-origin-main-e2e-aws-ovn-serial/1914323994480218112 failed with

: [sig-network][Feature:EgressIP][apigroup:operator.openshift.io] [external-targets][apigroup:user.openshift.io][apigroup:security.openshift.io] pods should have the assigned EgressIPs and EgressIPs can be updated [Skipped:Network/OpenShiftSDN] [Serial] [Suite:openshift/conformance/serial] expand_less 	2m22s
{  fail [github.com/openshift/origin/test/extended/networking/egressip.go:695]: Timed out after 120.000s.
Expected
    <bool>: false
to be true
Ginkgo exit error 1: exit with code 1}

which is a different test than the one I'm touching. How should I proceed to get this little improvement merged?

ldoktor avatar Apr 22 '25 11:04 ldoktor

Job Failure Risk Analysis for sha: c0e7f893acc8818e0d7d5b8b724067c2fdf2d4d5

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-disruptive Medium
[sig-node] static pods should start after being created
Potential external regression detected for High Risk Test analysis
---
[bz-Etcd] clusteroperator/etcd should not change condition/Available
Potential external regression detected for High Risk Test analysis
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 IncompleteTests
Tests for this run (24) are below the historical average (1428): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

openshift-trt[bot] avatar May 15 '25 05:05 openshift-trt[bot]

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot avatar Aug 14 '25 01:08 openshift-bot

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

openshift-bot avatar Sep 13 '25 08:09 openshift-bot