api icon indicating copy to clipboard operation
api copied to clipboard

OCPNODE-2877: Remove support to configure cgroupsv1 in OCP

Open sairameshv opened this issue 10 months ago • 40 comments

  • Removing support to configure cgroupsv1 in the OCP clusters.
  • Removed the enum validation of "v1" for the cgroupMode field of the nodes.config.openshift.io object.
  • Also added integration tests to validate the enum removal on the cgroupMode field

Enhancement Proposal Ref: https://github.com/openshift/enhancements/blob/master/enhancements/machine-config/mco-cgroupsv2-support.md

Summary:

  • This PR allows to block the user from setting cgroupMode v1
  • A change would be added for 4.18 in MCO to set machine-config cluster operator's Upgradeable=False when the cgroupMode is found to be v1 and request users to update to v2
  • All the clusters upgrading to 4.19 have to update to the minimum version of 4.18.z containing the above changes. This can be achieved through the cincinnati-graph-data repo

sairameshv avatar Jan 30 '25 15:01 sairameshv

@sairameshv: This pull request references OCPNODE-2877 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

According to this, RHEL is going to remove the cgroupsv1 support from RHEL 10 and hence there is a need to remove it from the OCP as well.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Jan 30 '25 15:01 openshift-ci-robot

Hello @sairameshv! Some important instructions when contributing to openshift/api: API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

openshift-ci[bot] avatar Jan 30 '25 15:01 openshift-ci[bot]

/jira refresh

sairameshv avatar Jan 30 '25 15:01 sairameshv

@sairameshv: This pull request references OCPNODE-2877 which is a valid jira issue.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Jan 30 '25 15:01 openshift-ci-robot

/hold Until updated enhancement proposal for cgroup v1 removal is merged

sairameshv avatar Feb 06 '25 11:02 sairameshv

@sairameshv: This pull request references OCPNODE-2877 which is a valid jira issue.

In response to this:

According to this, RHEL is going to remove the cgroupsv1 support from RHEL 10 and hence there is a need to remove it from the OCP as well.

Added a CEL validation to deny the setting of "v1" to the cgroupMode field of nodes.config.openshift.io object

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Feb 20 '25 06:02 openshift-ci-robot

/test verify

sairameshv avatar Feb 20 '25 10:02 sairameshv

@sairameshv: This pull request references OCPNODE-2877 which is a valid jira issue.

In response to this:

Removing support to configure cgroupsv1 in the OCP clusters. Added a CEL validation on the cgroupMode field of the nodes.config.openshift.io object to deny the setting of "v1"

Enhancement Proposal Ref: https://github.com/openshift/enhancements/pull/1751

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Feb 20 '25 16:02 openshift-ci-robot

/retest

sairameshv avatar Feb 20 '25 16:02 sairameshv

@sairameshv: This pull request references OCPNODE-2877 which is a valid jira issue.

In response to this:

  • Removing support to configure cgroupsv1 in the OCP clusters.
  • Added a CEL validation on the cgroupMode field of the nodes.config.openshift.io object to deny the setting of "v1"
  • Also added integration tests to validate the newly introduced CEL validation on the cgroupMode field

Enhancement Proposal Ref: https://github.com/openshift/enhancements/pull/1751

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Feb 26 '25 17:02 openshift-ci-robot

/retest

sairameshv avatar Feb 28 '25 02:02 sairameshv

/retest

sairameshv avatar Feb 28 '25 07:02 sairameshv

/lgtm

thanks!

haircommander avatar Feb 28 '25 16:02 haircommander

/retest

sairameshv avatar Feb 28 '25 23:02 sairameshv

/retest

sairameshv avatar Mar 03 '25 03:03 sairameshv

Changes LGTM, how do we know this is safe? Can you please explain in the PR description what has been done in 4.18 that makes this a safe change in 4.19?

JoelSpeed avatar Mar 04 '25 15:03 JoelSpeed

@sairameshv: This pull request references OCPNODE-2877 which is a valid jira issue.

In response to this:

  • Removing support to configure cgroupsv1 in the OCP clusters.
  • Removed the enum validation of "v1" for the cgroupMode field of the nodes.config.openshift.io object.
  • Also added integration tests to validate the enum removal on the cgroupMode field

Enhancement Proposal Ref: https://github.com/openshift/enhancements/pull/1751

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Mar 04 '25 15:03 openshift-ci-robot

@sairameshv: This pull request references OCPNODE-2877 which is a valid jira issue.

In response to this:

  • Removing support to configure cgroupsv1 in the OCP clusters.
  • Removed the enum validation of "v1" for the cgroupMode field of the nodes.config.openshift.io object.
  • Also added integration tests to validate the enum removal on the cgroupMode field

Enhancement Proposal Ref: https://github.com/openshift/enhancements/blob/master/enhancements/machine-config/mco-cgroupsv2-support.md

Summary:

  • This PR allows to block the user from setting cgroupMode v1
  • A change would be added for 4.18 in MCO to set machine-config cluster operator's Upgradeable=False when the cgroupMode is found to be v1 and request users to update to v2
  • All the clusters upgrading to 4.19 have to update to the minimum version of 4.18.z containing the above changes. This can be achieved through the cincinnati-graph-data repo

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Mar 04 '25 16:03 openshift-ci-robot

Changes LGTM, how do we know this is safe? Can you please explain in the PR description what has been done in 4.18 that makes this a safe change in 4.19?

As described in the enhancement proposal's Goal's section, the upgradadeability of the machine config cluster operator gets set to False when a cluster is found to be on CgroupModeV1. Also, we would make the 4.18.z cluster containing this change as a minimum cluster before upgrading to 4.19. Updated the description as well with the above explanation

sairameshv avatar Mar 04 '25 16:03 sairameshv

Also, we would make the 4.18.z cluster containing this change as a minimum cluster before upgrading to 4.19.

Which change do you mean? Is there something in 4.18 that already blocks upgrades if cgroups mode is v1, or is that still work to do?

JoelSpeed avatar Mar 04 '25 16:03 JoelSpeed

Also, we would make the 4.18.z cluster containing this change as a minimum cluster before upgrading to 4.19.

Which change do you mean? Is there something in 4.18 that already blocks upgrades if cgroups mode is v1, or is that still work to do?

The change still needs to be added

sairameshv avatar Mar 04 '25 16:03 sairameshv

yeah I think we should

/hold

on this until we have the upgradable=false condition in MCO and the upgrade edge defined in cincinati

haircommander avatar Mar 04 '25 16:03 haircommander

The change still needs to be added

Do this first. Once you have that logic in 4.18.z and set the minimum upgrade version in the upgrade graph, I'm happy to then merge this API PR to remove the value from the enum

JoelSpeed avatar Mar 04 '25 16:03 JoelSpeed

/hold cancel /lgtm /retest

https://github.com/openshift/machine-config-operator/pull/4921 and https://github.com/openshift/cincinnati-graph-data/pull/6948 have merged, so this is ready

haircommander avatar Mar 20 '25 16:03 haircommander

@JoelSpeed can you override the verify-crd-schema job? we're making it mad by droping an enum field

haircommander avatar Mar 20 '25 16:03 haircommander

/lgtm /override ci/prow/verify-crd-schema

The enum removal is safe as we have an upgrade block that prevents upgrades into this version of the API, implemented per https://github.com/openshift/api/pull/2181#issuecomment-2741027104

JoelSpeed avatar Mar 20 '25 16:03 JoelSpeed

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: haircommander, JoelSpeed, sairameshv

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar Mar 20 '25 16:03 openshift-ci[bot]

@JoelSpeed: Overrode contexts on behalf of JoelSpeed: ci/prow/verify-crd-schema

In response to this:

/lgtm /override ci/prow/verify-crd-schema

The enum removal is safe as we have an upgrade block that prevents upgrades into this version of the API, implemented per https://github.com/openshift/api/pull/2181#issuecomment-2741027104

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci[bot] avatar Mar 20 '25 16:03 openshift-ci[bot]

/retest-required

Remaining retests: 0 against base HEAD 75d64d71980b0e5f126c9a8b0c9423a808adc3e2 and 2 for PR HEAD daced88f841bd58ef41afc08fa55cc4fefbca20a in total

openshift-ci-robot avatar Mar 20 '25 19:03 openshift-ci-robot

/retest-required

Remaining retests: 0 against base HEAD 75d64d71980b0e5f126c9a8b0c9423a808adc3e2 and 2 for PR HEAD daced88f841bd58ef41afc08fa55cc4fefbca20a in total

openshift-ci-robot avatar Mar 21 '25 00:03 openshift-ci-robot