Failed to get original machineset in ARO
I created a brand new ARO cluster and attempted to apply an UpgradeConfig CR to it, using the capacityReservation: true setting. I'm encountering an error failed to get original machineset, which doesn't make sense to me because I haven't touched the original machinesets other than to scale out machines from 1 to 2. Based on the stacktrace error, it's also unclear what actions I need to take to resolve the issue.
UpgradeConfig CR:
apiVersion: upgrade.managed.openshift.io/v1alpha1
kind: UpgradeConfig
metadata:
name: managed-upgrade-config
namespace: openshift-managed-upgrade-operator
spec:
type: "ARO"
upgradeAt: "2024-03-08T15:35:00Z"
PDBForceDrainTimeout: 60
capacityReservation: true
desired:
channel: "stable-4.12"
version: "4.12.26"
The machineset:
$ oc get machineset -A
NAMESPACE NAME DESIRED CURRENT READY AVAILABLE AGE
openshift-machine-api aro-cluster-5c2tn-k4hhb-worker-eastus1 2 2 2 2 16h
openshift-machine-api aro-cluster-5c2tn-k4hhb-worker-eastus2 2 2 2 2 16h
openshift-machine-api aro-cluster-5c2tn-k4hhb-worker-eastus3 2 2 2 2 16h
Logs from MUO pod:
$ oc logs managed-upgrade-operator-6d7d6d8d65-2mwlx -f -n openshift-managed-upgrade-operator
ts=2024-03-08T15:40:46.358441985Z level=info logger=controller_upgradeconfig msg="Reconciling UpgradeConfig" Request.Namespace=openshift-managed-upgrade-operator Request.Name=managed-upgrade-config
ts=2024-03-08T15:40:46.994941661Z level=info logger=controller_upgradeconfig msg="Current cluster status" Request.Namespace=openshift-managed-upgrade-operator Request.Name=managed-upgrade-config status=Upgrading
ts=2024-03-08T15:40:46.994972261Z level=info logger=controller_upgradeconfig msg="Cluster detected as already upgrading." Request.Namespace=openshift-managed-upgrade-operator Request.Name=managed-upgrade-config
ts=2024-03-08T15:40:47.578525781Z level=info logger=controller_upgradeconfig msg="running step StartedNotificationSent" Request.Namespace=openshift-managed-upgrade-operator Request.Name=managed-upgrade-config
ts=2024-03-08T15:40:47.701591604Z level=info logger=controller_upgradeconfig msg="running step ClusterHealthyBeforeUpgrade" Request.Namespace=openshift-managed-upgrade-operator Request.Name=managed-upgrade-config
ts=2024-03-08T15:40:47.77726004Z level=info logger=controller_upgradeconfig msg="running step ExternalDependenciesAvailable" Request.Namespace=openshift-managed-upgrade-operator Request.Name=managed-upgrade-config
ts=2024-03-08T15:40:47.874410242Z level=info logger=controller_upgradeconfig msg="No external dependencies configured for availability checks. Skipping." Request.Namespace=openshift-managed-upgrade-operator Request.Name=managed-upgrade-config
ts=2024-03-08T15:40:47.874441943Z level=info logger=controller_upgradeconfig msg="running step ComputeCapacityReserved" Request.Namespace=openshift-managed-upgrade-operator Request.Name=managed-upgrade-config
ts=2024-03-08T15:40:47.899053847Z level=info logger=controller_upgradeconfig msg="failed to get machineset" Request.Namespace=openshift-managed-upgrade-operator Request.Name=managed-upgrade-config
ts=2024-03-08T15:40:47.899081947Z level=error logger=controller_upgradeconfig msg="error when ComputeCapacityReserved" Request.Namespace=openshift-managed-upgrade-operator Request.Name=managed-upgrade-config error="failed to get original machineset" stacktrace="github.com/openshift/managed-upgrade-operator/pkg/upgradesteps.Run\n\t/workdir/pkg/upgradesteps/runner.go:30\ngithub.com/openshift/managed-upgrade-operator/pkg/upgraders.(*clusterUpgrader).runSteps\n\t/workdir/pkg/upgraders/upgrader.go:63\ngithub.com/openshift/managed-upgrade-operator/pkg/upgraders.(*aroUpgrader).UpgradeCluster\n\t/workdir/pkg/upgraders/aroupgrader.go:86\ngithub.com/openshift/managed-upgrade-operator/controllers/upgradeconfig.(*ReconcileUpgradeConfig).upgradeCluster\n\t/workdir/controllers/upgradeconfig/upgradeconfig_controller.go:253\ngithub.com/openshift/managed-upgrade-operator/controllers/upgradeconfig.(*ReconcileUpgradeConfig).Reconcile\n\t/workdir/controllers/upgradeconfig/upgradeconfig_controller.go:235\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:118\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:314\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:226"
ts=2024-03-08T15:40:47.910533589Z level=error msg="Reconciler error" controller=upgradeconfig controllerGroup=upgrade.managed.openshift.io controllerKind=UpgradeConfig UpgradeConfig="{managed-upgrade-config openshift-managed-upgrade-operator}" namespace=openshift-managed-upgrade-operator name=managed-upgrade-config reconcileID=b3169141-583d-4d3b-be9c-8efbae168f3d error="1 error occurred:\n\t* failed to get original machineset\n\n" stacktrace="sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:226"
I'm running 4.12.25 on ARO subscribed to stable-4.12 channel:
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.12.25 True False 15h Cluster version is 4.12.25
@kevchu3 For the ARO upgrade type by default we have disabled the capacity reservation. However, It looks like you have enabled it in upgradeconfig CR. It seems to me the MUO is not running the latest version in your cluster which contains fix #382
@a7vicky this makes sense. As a follow up, is there anywhere in the docs that says that capacity reservation is not supported in ARO? Ideally it'd be in the FAQs of this Github project.
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.
If this issue is safe to close now please do so with /close.
/lifecycle stale
/close
@kevchu3: Closing this issue.
In response to this:
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.