cli icon indicating copy to clipboard operation
cli copied to clipboard

Improve error reporting when instance fails to reconcile

Open BrendanFishback opened this issue 11 months ago • 1 comments

Feature and motivation

When running the MAS CLI mas install command, if any existing MAS Suite CR in the cluster is in a problematic state, the installer will fail on Step 3. The error message will be:

3) Review MAS Instances
The following MAS intances are installed on the target cluster and will be affected by the catalog update:
Traceback (most recent call last):
  File "/opt/app-root/bin/mas-cli", line 62, in 
    app.update(argv[2:])
  File "/opt/app-root/lib64/python3.9/site-packages/mas/cli/update/app.py", line 94, in update
    self.reviewMASInstance()
  File "/opt/app-root/lib64/python3.9/site-packages/mas/cli/update/app.py", line 225, in reviewMASInstance
    suites = listMasInstances(self.dynamicClient)
  File "/opt/app-root/lib64/python3.9/site-packages/mas/devops/mas.py", line 124, in listMasInstances
    logger.info(f" * {suite['metadata']['name']} v{suite['status']['versions']['reconciled']}")
KeyError: 'reconciled'

This error message does not explain clearly what the problem is, or which Suite CR is causing the error message. The proposed enhancement would be for the error message to clearly state which Suite CR needs to be reviewed/fixed before the installation can proceed.

Usage example

This would minimize the time needed to investigate any failed installations due to this error, as well as reducing the amount of support cases created by clients who do not understand what the current error means.

BrendanFishback avatar Jan 16 '25 15:01 BrendanFishback

The proposed enhancement would be for the error message to clearly state which Suite CR needs to be reviewed/fixed before the installation can proceed.

Agreed, we should be able to catch instances in unexpected state and trigger a FatalError with a clear message indicating that we don't recommend (and thus don't support) installing MAS instance on a cluster that is already unhealthy, and provide details of the MAS instance that's not in a good state.

durera avatar Mar 07 '25 11:03 durera