origin icon indicating copy to clipboard operation
origin copied to clipboard

Only extract node role from properly formatted node-role label

Open stbenjam opened this issue 1 year ago • 8 comments

The convention is a format like node-role.kubernetes.io/role: "", not node-role.kubernetes.io: role, however ROSA uses the latter format to indicate the infra role. This changes the node watch code to ignore it, as well as other potential variations like node-role.kubernetes.io/.

The current code panics when run against a ROSA cluster:

  E0209 18:10:55.533265      78 runtime.go:79] Observed a panic: runtime.boundsError{x:24, y:23, signed:true, code:0x3} (runtime error: slice bounds out of range [24:23])
  goroutine 233 [running]:
  k8s.io/apimachinery/pkg/util/runtime.logPanic({0x7a71840?, 0xc0018e2f48})
  	k8s.io/[email protected]/pkg/util/runtime/runtime.go:75 +0x99
  k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x1000251f9fe?})
  	k8s.io/[email protected]/pkg/util/runtime/runtime.go:49 +0x75
  panic({0x7a71840, 0xc0018e2f48})
  	runtime/panic.go:884 +0x213
  github.com/openshift/origin/pkg/monitortests/node/watchnodes.nodeRoles(0x7ecd7b3?)
  	github.com/openshift/origin/pkg/monitortests/node/watchnodes/node.go:187 +0x1e5
  github.com/openshift/origin/pkg/monitortests/node/watchnodes.startNodeMonitoring.func1(0

stbenjam avatar Feb 09 '24 20:02 stbenjam

/hold

stbenjam avatar Feb 09 '24 20:02 stbenjam

/lgtm

Unhold when ready.

dgoodwin avatar Feb 13 '24 13:02 dgoodwin

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dgoodwin, stbenjam

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar Feb 13 '24 13:02 openshift-ci[bot]

/retest

ritmun avatar Feb 13 '24 17:02 ritmun

/hold cancel

stbenjam avatar Feb 13 '24 18:02 stbenjam

/retest-required

stbenjam avatar Feb 19 '24 18:02 stbenjam

/hold

Have to get bugs lined up so I can cherry-pick this back to where folks are testing ROSA

stbenjam avatar Feb 19 '24 22:02 stbenjam

@stbenjam: This pull request references Jira Issue OCPBUGS-29858, which is invalid:

  • expected the bug to target the "4.16.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

The convention is a format like node-role.kubernetes.io/role: "", not node-role.kubernetes.io: role, however ROSA uses the latter format to indicate the infra role. This changes the node watch code to ignore it, as well as other potential variations like node-role.kubernetes.io/.

The current code panics when run against a ROSA cluster:

 E0209 18:10:55.533265      78 runtime.go:79] Observed a panic: runtime.boundsError{x:24, y:23, signed:true, code:0x3} (runtime error: slice bounds out of range [24:23])
 goroutine 233 [running]:
 k8s.io/apimachinery/pkg/util/runtime.logPanic({0x7a71840?, 0xc0018e2f48})
 	k8s.io/[email protected]/pkg/util/runtime/runtime.go:75 +0x99
 k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x1000251f9fe?})
 	k8s.io/[email protected]/pkg/util/runtime/runtime.go:49 +0x75
 panic({0x7a71840, 0xc0018e2f48})
 	runtime/panic.go:884 +0x213
 github.com/openshift/origin/pkg/monitortests/node/watchnodes.nodeRoles(0x7ecd7b3?)
 	github.com/openshift/origin/pkg/monitortests/node/watchnodes/node.go:187 +0x1e5
 github.com/openshift/origin/pkg/monitortests/node/watchnodes.startNodeMonitoring.func1(0

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Feb 22 '24 17:02 openshift-ci-robot

/hold cancel /retest-required /cherry-pick release-4.15

stbenjam avatar Feb 22 '24 17:02 stbenjam

@stbenjam: once the present PR merges, I will cherry-pick it on top of release-4.15 in a new PR and assign it to you.

In response to this:

/hold cancel /retest-required /cherry-pick release-4.15

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

/cherry-pick release-4.14

stbenjam avatar Feb 22 '24 17:02 stbenjam

@stbenjam: once the present PR merges, I will cherry-pick it on top of release-4.14 in a new PR and assign it to you.

In response to this:

/cherry-pick release-4.14

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

/jira refresh

stbenjam avatar Feb 22 '24 17:02 stbenjam

@stbenjam: This pull request references Jira Issue OCPBUGS-29858, which is invalid:

  • expected the bug to target only the "4.16.0" version, but multiple target versions were set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Feb 22 '24 17:02 openshift-ci-robot

/jira refresh

stbenjam avatar Feb 22 '24 17:02 stbenjam

@stbenjam: This pull request references Jira Issue OCPBUGS-29858, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.16.0) matches configured target version for branch (4.16.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Feb 22 '24 17:02 openshift-ci-robot

/retest-required

Remaining retests: 0 against base HEAD 93d6a84ccac34608e2f7ef425f5dc4ef7b948e6e and 2 for PR HEAD 7fd01a1abf76fee2b4ee4239620dde24770591c7 in total

openshift-ci-robot avatar Feb 22 '24 19:02 openshift-ci-robot

/retest-required

Remaining retests: 0 against base HEAD 9c9713ed8c98e88b089ebca8ee5b7c1bba7423d2 and 1 for PR HEAD 7fd01a1abf76fee2b4ee4239620dde24770591c7 in total

openshift-ci-robot avatar Feb 22 '24 23:02 openshift-ci-robot

/retest-required

Remaining retests: 0 against base HEAD 96d2578c1d54c9d1091f37746abb2858d2b876d9 and 0 for PR HEAD 7fd01a1abf76fee2b4ee4239620dde24770591c7 in total

openshift-ci-robot avatar Feb 23 '24 08:02 openshift-ci-robot

/hold

Revision 7fd01a1abf76fee2b4ee4239620dde24770591c7 was retested 3 times: holding

openshift-ci-robot avatar Feb 23 '24 11:02 openshift-ci-robot

/hold cancel /retest-required

stbenjam avatar Feb 26 '24 02:02 stbenjam

@stbenjam: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-single-node-serial 7fd01a1abf76fee2b4ee4239620dde24770591c7 link false /test e2e-aws-ovn-single-node-serial
ci/prow/e2e-aws-ovn-single-node 7fd01a1abf76fee2b4ee4239620dde24770591c7 link false /test e2e-aws-ovn-single-node
ci/prow/e2e-aws-ovn-single-node-upgrade 7fd01a1abf76fee2b4ee4239620dde24770591c7 link false /test e2e-aws-ovn-single-node-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci[bot] avatar Feb 26 '24 09:02 openshift-ci[bot]

/retest-required

Remaining retests: 0 against base HEAD 96d2578c1d54c9d1091f37746abb2858d2b876d9 and 2 for PR HEAD 7fd01a1abf76fee2b4ee4239620dde24770591c7 in total

openshift-ci-robot avatar Feb 26 '24 09:02 openshift-ci-robot

/override ci/prow/e2e-aws-ovn-serial

stbenjam avatar Feb 26 '24 13:02 stbenjam

/skip

stbenjam avatar Feb 26 '24 13:02 stbenjam

@stbenjam: Overrode contexts on behalf of stbenjam: ci/prow/e2e-aws-ovn-serial

In response to this:

/override ci/prow/e2e-aws-ovn-serial

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci[bot] avatar Feb 26 '24 13:02 openshift-ci[bot]

@stbenjam: Jira Issue OCPBUGS-29858: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-29858 has been moved to the MODIFIED state.

In response to this:

The convention is a format like node-role.kubernetes.io/role: "", not node-role.kubernetes.io: role, however ROSA uses the latter format to indicate the infra role. This changes the node watch code to ignore it, as well as other potential variations like node-role.kubernetes.io/.

The current code panics when run against a ROSA cluster:

 E0209 18:10:55.533265      78 runtime.go:79] Observed a panic: runtime.boundsError{x:24, y:23, signed:true, code:0x3} (runtime error: slice bounds out of range [24:23])
 goroutine 233 [running]:
 k8s.io/apimachinery/pkg/util/runtime.logPanic({0x7a71840?, 0xc0018e2f48})
 	k8s.io/[email protected]/pkg/util/runtime/runtime.go:75 +0x99
 k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x1000251f9fe?})
 	k8s.io/[email protected]/pkg/util/runtime/runtime.go:49 +0x75
 panic({0x7a71840, 0xc0018e2f48})
 	runtime/panic.go:884 +0x213
 github.com/openshift/origin/pkg/monitortests/node/watchnodes.nodeRoles(0x7ecd7b3?)
 	github.com/openshift/origin/pkg/monitortests/node/watchnodes/node.go:187 +0x1e5
 github.com/openshift/origin/pkg/monitortests/node/watchnodes.startNodeMonitoring.func1(0

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Feb 26 '24 13:02 openshift-ci-robot

@stbenjam: new pull request created: #28615

In response to this:

/hold cancel /retest-required /cherry-pick release-4.15

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@stbenjam: new pull request created: #28616

In response to this:

/cherry-pick release-4.14

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

[ART PR BUILD NOTIFIER]

This PR has been included in build openshift-enterprise-tests-container-v4.16.0-202402261639.p0.g330feda.assembly.stream.el8 for distgit openshift-enterprise-tests. All builds following this will include this PR.

openshift-bot avatar Feb 26 '24 18:02 openshift-bot