mas upgrade command for one instance requires manage on a different instance to be up and running.
MAS CLI version
quay.io/ibmmas/cli:13.15.0
CLI function used
upgrade
What happened?
First of all this is an airgap environment. I ran a "mas upgrade" command where the instance selection gave me 3 options. I typed the second instance (inst2) and proceeded with the upgrade.
- Instance Selection Select a MAS instance to upgrade from the list below:
- inst1 v8.11.7
- inst2 v8.11.7
- inst3 v8.11.7
Enter MAS instance ID: inst2
- License Terms To continue with the upgrade, you must accept the license terms:
- https://ibm.biz/MAS90-License
- https://ibm.biz/MaximoIT90-License
- https://ibm.biz/MAXArcGIS90-License Do you accept the license terms? [y/n] y
- Review Settings Instance ID ..................... inst2 Current MAS Channel ............. 8.11.x Next MAS Channel ................ 9.0.x Skip Pre-Upgrade Checks ......... False
Proceed with these settings?? [y/n] y
- Launch Upgrade ✅️ OpenShift Pipelines Operator is installed and ready to use ✅️ Namespace is ready (mas-inst2-pipelines) ✅️ Latest Tekton definitions are installed (v13.15.0) ✅️ PipelineRun for inst2 upgrade submitted
Now the upgrade can not continue because manage on instance inst3 is not up. Every instance should be totally independent from each other. Why will I need one instance to be up to be able to upgrade a different instance.
The pre-upgrade-check step fails here:
TASK [ibm.mas_devops.ocp_verify : Check Deployment & StatefulSet Status] ******* Checking Deployments are healthy (1/40 retries with a 300 second delay) [NOTREADY] mas-inst3-manage/inst3-masgolf-all = 1 replicas/None ready/1 updated/None available Finished check: Delaying 300 seconds before next check
Relevant log output
step-ocp-verify-workloads
Export all env vars defined in /workspace/settings
Using /opt/app-root/src/ansible.cfg as config file
[WARNING]: No inventory was parsed, only implicit localhost is available
[WARNING]: provided hosts list is empty, only localhost is available. Note that
the implicit localhost does not match 'all'
running playbook inside collection ibm.mas_devops
[DEPRECATION WARNING]: community.general.yaml has been deprecated. The plugin
has been superseded by the the option `result_format=yaml` in callback plugin
ansible.builtin.default from ansible-core 2.13 onwards. This feature will be
removed from community.general in version 13.0.0. Deprecation warnings can be
disabled by setting deprecation_warnings=False in ansible.cfg.
PLAY [localhost] ***************************************************************
TASK [Gathering Facts] *********************************************************
ok: [localhost]
TASK [ibm.mas_devops.ansible_version_check : Verify minimum Ansible version is 2.10.3] ***
ok: [localhost] => changed=false
msg: All assertions passed
TASK [ibm.mas_devops.ocp_verify : Check if cluster is ready] *******************
skipping: [localhost] => changed=false
false_condition: verify_cluster
skip_reason: Conditional result was False
TASK [ibm.mas_devops.ocp_verify : Check CatalogSource Status] ******************
skipping: [localhost] => changed=false
false_condition: verify_catalogsources
skip_reason: Conditional result was False
TASK [ibm.mas_devops.ocp_verify : Check Subscription Status] *******************
skipping: [localhost] => changed=false
false_condition: verify_subscriptions
skip_reason: Conditional result was False
TASK [ibm.mas_devops.ocp_verify : Check Deployment & StatefulSet Status] *******
Checking Deployments are healthy (1/40 retries with a 300 second delay)
[NOTREADY] mas-inst3-manage/inst3-masgolf-all = 1 replicas/None ready/1 updated/None available
Finished check: Delaying 300 seconds before next check
Checking Deployments are healthy (2/40 retries with a 300 second delay)
[NOTREADY] mas-inst3-manage/inst3-masgolf-all = 1 replicas/None ready/1 updated/None available
Finished check: Delaying 300 seconds before next check
Checking Deployments are healthy (3/40 retries with a 300 second delay)
[NOTREADY] mas-inst3-manage/inst3-masgolf-all = 1 replicas/None ready/1 updated/None available
Finished check: Delaying 300 seconds before next check
Checking Deployments are healthy (4/40 retries with a 300 second delay)
[NOTREADY] mas-inst3-manage/inst3-masgolf-all = 1 replicas/None ready/1 updated/None available
Finished check: Delaying 300 seconds before next check
Checking Deployments are healthy (5/40 retries with a 300 second delay)
[NOTREADY] mas-inst3-manage/inst3-masgolf-all = 1 replicas/None ready/1 updated/None available
Finished check: Delaying 300 seconds before next check
Please look at IBM case number [TS018940690]. I perfectly understand that at cluster level things should be working but for several reasons instances should be independent. One manage should not depend on another
Our install, upgrade, and update pre checks are designed to prioritize safety/cautious & be extra pessimistic if they see anything unhealthy in the cluster ... If we detect any unhealthy catalogs/operators/deployments/statefulsets in the cluster then we abort the operation before it makes any changes to the cluster -- based on the assumption that the problem found may indicate a wider problem on the cluster which could impact the upgrade/install/update that you are about to start.
This safety check can be disabled by adding --skip-pre-check to the command. The actual health of the other MAS instances won't really affect this, as long as the basic kubernetes resources are healthy .. The deployment that is unhealthy here happens to be a MAS/manage one, but the failure is purely being assessed at a Kubernetes resource level ... There are one or more unhealthy deployments in this cluster, which may indicate that there is a problem in the cluster, so we will not proceed with this action (which places a reasonably amount of load on the kubernetes API server).