origin icon indicating copy to clipboard operation
origin copied to clipboard

CNTRLPLANE-945: WIP: Add tests for ExternalOIDC

Open everettraven opened this issue 5 months ago • 4 comments
trafficstars

everettraven avatar Jun 12 '25 19:06 everettraven

Skipping CI for Draft Pull Request. If you want CI signal for your change, please convert it to an actual PR. You can still manually trigger a test run with /test all

openshift-ci[bot] avatar Jun 12 '25 19:06 openshift-ci[bot]

@everettraven: This pull request references CNTRLPLANE-945 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Jun 12 '25 19:06 openshift-ci-robot

/test all

everettraven avatar Jun 16 '25 18:06 everettraven

/test all

everettraven avatar Jun 17 '25 18:06 everettraven

/test all

everettraven avatar Jul 02 '25 17:07 everettraven

Job Failure Risk Analysis for sha: 5d8b75523aa32095f3584cc5bd42146a4b0352bb

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-azure-ovn-upgrade Medium
Job run should complete before timeout
This test has passed 96.51% of 3980 runs on release 4.20 [Overall] in the last week.

openshift-trt[bot] avatar Jul 02 '25 21:07 openshift-trt[bot]

/test all

everettraven avatar Jul 10 '25 14:07 everettraven

Job Failure Risk Analysis for sha: 81377a473413c3bf3f006143102f68e661c1011f

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-ovn-etcd-scaling Low
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 50.00% of 2 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:aws SecurityMode:default Topology:ha Upgrade:none] in the last week.
---
[bz-kube-storage-version-migrator] clusteroperator/kube-storage-version-migrator should not change condition/Available
This test has passed 50.00% of 2 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:aws SecurityMode:default Topology:ha Upgrade:none] in the last week.
pull-ci-openshift-origin-main-e2e-azure-ovn-etcd-scaling Medium
[bz-etcd][invariant] alert/etcdMembersDown should not be at or above info
Potential external regression detected for High Risk Test analysis

Open Bugs
etcdMembersDown should not fire on healthy etcd scaling event
pull-ci-openshift-origin-main-e2e-gcp-disruptive Medium
[bz-Etcd] clusteroperator/etcd should not change condition/Available
Potential external regression detected for High Risk Test analysis
pull-ci-openshift-origin-main-e2e-gcp-ovn-etcd-scaling Low
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:gcp SecurityMode:default Topology:ha Upgrade:none] in the last week.
pull-ci-openshift-origin-main-e2e-vsphere-ovn-etcd-scaling Low
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:vsphere SecurityMode:default Topology:ha Upgrade:none] in the last week.
---
[sig-api-machinery] disruption/cache-kube-api apiserver/kube-apiserver connection/new should be available throughout the test
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:vsphere SecurityMode:default Topology:ha Upgrade:none] in the last week.
---
[sig-api-machinery] disruption/oauth-api apiserver/oauth-apiserver connection/new should be available throughout the test
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:vsphere SecurityMode:default Topology:ha Upgrade:none] in the last week.
---
[sig-api-machinery] disruption/cache-oauth-api apiserver/oauth-apiserver connection/new should be available throughout the test
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:vsphere SecurityMode:default Topology:ha Upgrade:none] in the last week.
---
Showing 4 of 5 test results

openshift-trt[bot] avatar Jul 10 '25 19:07 openshift-trt[bot]

@everettraven: This pull request references CNTRLPLANE-945 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

This PR adds the tests necessary to promote the ExternalOIDC and ExternalOIDCWithUIDAndExtraClaimMappings feature-gates on OpenShift.

It adds a new test suite specific to tests that test the external OIDC provider authentication mode. Some tests are intentionally marked as skipped as the functionality does not yet exist to test, but I wanted to still provide the skeleton for these tests so that we can easily make some updates when that functionality is implemented.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Jul 15 '25 16:07 openshift-ci-robot

Job Failure Risk Analysis for sha: effd33f37b50a8a568c19c1b4a1b145cd234ff40

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws Medium
Job run should complete before timeout
This test has passed 80.54% of 3648 runs on release 4.20 [Overall] in the last week.
pull-ci-openshift-origin-main-e2e-aws-ovn-etcd-scaling Medium
Job run should complete before timeout
This test has passed 80.54% of 3648 runs on release 4.20 [Overall] in the last week.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 Medium
Job run should complete before timeout
This test has passed 80.54% of 3648 runs on release 4.20 [Overall] in the last week.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 Medium
Job run should complete before timeout
This test has passed 80.54% of 3648 runs on release 4.20 [Overall] in the last week.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-publicnet-2of2 Medium
Job run should complete before timeout
This test has passed 80.54% of 3648 runs on release 4.20 [Overall] in the last week.
pull-ci-openshift-origin-main-e2e-aws-proxy Medium
Job run should complete before timeout
This test has passed 80.54% of 3648 runs on release 4.20 [Overall] in the last week.

openshift-trt[bot] avatar Jul 15 '25 23:07 openshift-trt[bot]

We could move the keycloak_*.go files into their own package; it'll make it clearer to reuse functionality in other tests if ever needed.

Is it likely that we use this beyond testing authentication? IMO we can extract this whenever we have a use case to re-use it. For now, I imagine we only care to use it for authentication related tests.

everettraven avatar Jul 16 '25 12:07 everettraven

/hold

botched rebase

everettraven avatar Jul 16 '25 17:07 everettraven

Job Failure Risk Analysis for sha: 68dad7befe4bbc951328cb57dcc87e6c8f17c2ee

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-gcp-disruptive IncompleteTests
Tests for this run (106) are below the historical average (1225): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

openshift-trt[bot] avatar Jul 16 '25 18:07 openshift-trt[bot]

/hold cancel

everettraven avatar Jul 16 '25 19:07 everettraven

/retest

everettraven avatar Jul 17 '25 12:07 everettraven

Job Failure Risk Analysis for sha: fa7b649181060d0710359956ffdf4f23ddda236d

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-disruptive IncompleteTests
Tests for this run (31) are below the historical average (960): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-etcd-scaling Low
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 50.00% of 2 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:aws SecurityMode:default Topology:ha Upgrade:none] in the last week.
pull-ci-openshift-origin-main-e2e-gcp-disruptive IncompleteTests
Tests for this run (107) are below the historical average (1057): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

openshift-trt[bot] avatar Jul 17 '25 17:07 openshift-trt[bot]

/retest-required

everettraven avatar Jul 18 '25 14:07 everettraven

Thanks for the changes @everettraven :+1:

/lgtm

liouk avatar Jul 18 '25 14:07 liouk

/retest

everettraven avatar Jul 18 '25 20:07 everettraven

/retest

everettraven avatar Jul 22 '25 14:07 everettraven

Job Failure Risk Analysis for sha: 0b3f84e164cadb566da5936c8e6996b40589fb12

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-disruptive IncompleteTests
Tests for this run (18) are below the historical average (537): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-azure-ovn-upgrade IncompleteTests
Tests for this run (19) are below the historical average (3894): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-disruptive IncompleteTests
Tests for this run (18) are below the historical average (593): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-fips-serial-1of2 IncompleteTests
Tests for this run (19) are below the historical average (1855): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

openshift-trt[bot] avatar Jul 22 '25 17:07 openshift-trt[bot]

So the intention is to run those FG tests in separate jobs that exercise the new suite? Just want to confirm that non of the FG tests here will be exercised by existing conformance jobs and therefore you might need to create different variants of jobs to have enough coverage to graduate the feature.

@xueqzhan Correct. I have https://github.com/openshift/release/pull/66980 up to add a new periodic job to run the new suite. It doesn't yet contain all the variants to graduate the feature but I intend to expand that once this merges and I've determined the proper run time for this suite (it needs to be a long running suite due to each test needing to rollout a new revision of the KAS).

everettraven avatar Jul 22 '25 17:07 everettraven

Looks like there is a build error I need to look into https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/29917/pull-ci-openshift-origin-main-e2e-gcp-fips-serial-2of2/1947670304780193792

/hold

everettraven avatar Jul 22 '25 17:07 everettraven

Build issues were due to a need to rebase and properly pickup changes from an o/api bump I missed. Should be fixed now.

/hold cancel

everettraven avatar Jul 22 '25 17:07 everettraven

/retest

everettraven avatar Jul 23 '25 12:07 everettraven

/retest

everettraven avatar Jul 23 '25 18:07 everettraven

/approve

xueqzhan avatar Jul 23 '25 19:07 xueqzhan

Job Failure Risk Analysis for sha: e72069ca0446bb012fc2c80b525763c11da3f258

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-disruptive IncompleteTests
Tests for this run (30) are below the historical average (305): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-etcd-scaling Low
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 50.00% of 2 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:aws SecurityMode:default Topology:ha Upgrade:none] in the last week.

Open Bugs
etcd-scaling jobs failing ~60% of the time
pull-ci-openshift-origin-main-e2e-azure-ovn-etcd-scaling Medium
[sig-architecture] platform pods in ns/openshift-etcd should not exit an excessive amount of times
Potential external regression detected for High Risk Test analysis

Open Bugs
etcd platform pod exist test failing on etcd-scaling jobs
pull-ci-openshift-origin-main-e2e-gcp-disruptive IncompleteTests
Tests for this run (107) are below the historical average (339): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn-etcd-scaling Low
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:gcp SecurityMode:default Topology:ha Upgrade:none] in the last week.

Open Bugs
etcd-scaling jobs failing ~60% of the time
pull-ci-openshift-origin-main-e2e-gcp-ovn-techpreview-serial-2of2 Medium
[sig-arch] events should not repeat pathologically for ns/openshift-multus
Potential external regression detected for High Risk Test analysis
---
[sig-arch] events should not repeat pathologically for ns/openshift-network-diagnostics
Potential external regression detected for High Risk Test analysis
pull-ci-openshift-origin-main-e2e-vsphere-ovn-etcd-scaling High
[bz-etcd][invariant] alert/etcdMembersDown should not be at or above info
This test has passed 99.93% of 4157 runs on release 4.20 [Overall] in the last week.

Open Bugs
etcd-scaling jobs failing ~60% of the time

openshift-trt[bot] avatar Jul 24 '25 00:07 openshift-trt[bot]

Job Failure Risk Analysis for sha: e72069ca0446bb012fc2c80b525763c11da3f258

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-disruptive IncompleteTests
Tests for this run (30) are below the historical average (220): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-etcd-scaling Low
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 50.00% of 2 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:aws SecurityMode:default Topology:ha Upgrade:none] in the last week.

Open Bugs
etcd-scaling jobs failing ~60% of the time
pull-ci-openshift-origin-main-e2e-azure-ovn-etcd-scaling Medium
[sig-architecture] platform pods in ns/openshift-etcd should not exit an excessive amount of times
Potential external regression detected for High Risk Test analysis

Open Bugs
etcd platform pod exist test failing on etcd-scaling jobs
pull-ci-openshift-origin-main-e2e-gcp-disruptive IncompleteTests
Tests for this run (107) are below the historical average (230): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn-etcd-scaling Low
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:gcp SecurityMode:default Topology:ha Upgrade:none] in the last week.

Open Bugs
etcd-scaling jobs failing ~60% of the time
pull-ci-openshift-origin-main-e2e-gcp-ovn-techpreview-serial-2of2 Medium
[sig-arch] events should not repeat pathologically for ns/openshift-multus
Potential external regression detected for High Risk Test analysis
---
[sig-arch] events should not repeat pathologically for ns/openshift-network-diagnostics
Potential external regression detected for High Risk Test analysis
pull-ci-openshift-origin-main-e2e-vsphere-ovn-etcd-scaling High
[bz-etcd][invariant] alert/etcdMembersDown should not be at or above info
This test has passed 99.93% of 4157 runs on release 4.20 [Overall] in the last week.

Open Bugs
etcd-scaling jobs failing ~60% of the time

openshift-trt[bot] avatar Jul 24 '25 00:07 openshift-trt[bot]

Job Failure Risk Analysis for sha: bca83171c1f29d906ac4ada2ed8ad06c8702e5ba

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-disruptive IncompleteTests
Tests for this run (106) are below the historical average (202): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-azure-ovn-etcd-scaling Low
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:azure SecurityMode:default Topology:ha Upgrade:none] in the last week.

Open Bugs
etcd-scaling jobs failing ~60% of the time
---
[bz-kube-storage-version-migrator] clusteroperator/kube-storage-version-migrator should not change condition/Available
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:azure SecurityMode:default Topology:ha Upgrade:none] in the last week.

Open Bugs
etcd-scaling jobs failing ~60% of the time
[CI] e2e-openstack-ovn-etcd-scaling job permanent fails at many openshift-test tests
pull-ci-openshift-origin-main-e2e-gcp-disruptive IncompleteTests
Tests for this run (107) are below the historical average (213): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn-techpreview-serial-2of2 Medium
[sig-arch] events should not repeat pathologically for ns/openshift-multus
Potential external regression detected for High Risk Test analysis
---
[sig-arch] events should not repeat pathologically for ns/openshift-network-diagnostics
Potential external regression detected for High Risk Test analysis
pull-ci-openshift-origin-main-e2e-vsphere-ovn-etcd-scaling Medium
[sig-network] pods should successfully create sandboxes by adding pod to network
This test has passed 97.91% of 4311 runs on release 4.20 [Overall] in the last week.

Open Bugs
Component Readiness: pods should successfully create sandboxes by adding pod to network: expected pod UID "aa853924-c6c6-45b7-be56-e059960bc3c6" but got "ab26e0dc-d736-4945-aa02-91fa3f066cdc" from Kube API
"[sig-network] pods should successfully create sandboxes by adding pod to network" fails often on compact CI jobs

openshift-trt[bot] avatar Jul 24 '25 22:07 openshift-trt[bot]