cli icon indicating copy to clipboard operation
cli copied to clipboard

MAS Assist install causes degraded storage class

Open marekpolujan opened this issue 8 months ago • 1 comments

MAS CLI version

13.21.0

CLI function used

install

What happened?

After a successful MAS install for Manage only, I ran a secondary MAS Install to install Assist stand alone. During the pipeline installation, there was a failure during the COS log, see attached. Post failure, My storage Data foundation shows a degraded service. This has occurred on two different techzone image installs.

Image

mas install.txt

Relevant log output

STEP-COS

Export all env vars defined in /workspace/settings
Using /opt/app-root/src/ansible.cfg as config file
[WARNING]: No inventory was parsed, only implicit localhost is available
[WARNING]: provided hosts list is empty, only localhost is available. Note that
the implicit localhost does not match 'all'
running playbook inside collection ibm.mas_devops
[DEPRECATION WARNING]: community.general.yaml has been deprecated. The plugin 
has been superseded by the the option `result_format=yaml` in callback plugin 
ansible.builtin.default from ansible-core 2.13 onwards. This feature will be 
removed from community.general in version 13.0.0. Deprecation warnings can be 
disabled by setting deprecation_warnings=False in ansible.cfg.

PLAY [localhost] ***************************************************************

TASK [Gathering Facts] *********************************************************
ok: [localhost]

TASK [ibm.mas_devops.ansible_version_check : Verify minimum Ansible version is 2.10.3] ***
ok: [localhost] => changed=false 
  msg: All assertions passed

TASK [ibm.mas_devops.cos : Fail if cos_type is not provided] *******************
ok: [localhost] => changed=false 
  msg: All assertions passed

TASK [ibm.mas_devops.cos : Fail if cos_action is not provided] *****************
ok: [localhost] => changed=false 
  msg: All assertions passed

TASK [ibm.mas_devops.cos : Run the task for the appropriate provider] **********
included: /opt/app-root/lib64/python3.9/site-packages/ansible_collections/ibm/mas_devops/roles/cos/tasks/providers/ocs/provision.yml for localhost

TASK [ibm.mas_devops.cos : Get OCS cluster] ************************************
ok: [localhost] => changed=false 
  api_found: true
  resources:
  - apiVersion: ocs.openshift.io/v1
    kind: StorageCluster
    metadata:
      annotations:
        argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
        argocd.argoproj.io/sync-wave: '20'
        argocd.argoproj.io/tracking-id: hub01-infra01-itz-hlfkn3-odf-external-instance:ocs.openshift.io/StorageCluster:openshift-storage/ocs-external-storagecluster
        kubectl.kubernetes.io/last-applied-configuration: |-
          {"apiVersion":"ocs.openshift.io/v1","kind":"StorageCluster","metadata":{"annotations":{"argocd.argoproj.io/sync-options":"SkipDryRunOnMissingResource=true","argocd.argoproj.io/sync-wave":"20","argocd.argoproj.io/tracking-id":"hub01-infra01-itz-hlfkn3-odf-external-instance:ocs.openshift.io/StorageCluster:openshift-storage/ocs-external-storagecluster"},"name":"ocs-external-storagecluster","namespace":"openshift-storage"},"spec":{"encryption":{},"externalStorage":{"enable":true},"labelSelector":{},"managedResources":{"cephBlockPools":{"disableStorageClass":true},"cephFilesystems":{},"cephObjectStoreUsers":{},"cephObjectStores":{}}}}
        uninstall.ocs.openshift.io/cleanup-policy: delete
        uninstall.ocs.openshift.io/mode: graceful
      creationTimestamp: '2025-05-09T22:17:38Z'
      finalizers:
      - storagecluster.ocs.openshift.io
      generation: 2
      managedFields:
      - apiVersion: ocs.openshift.io/v1
        fieldsType: FieldsV1
        fieldsV1:
          f:metadata:
            f:annotations:
              .: {}
              f:argocd.argoproj.io/sync-options: {}
              f:argocd.argoproj.io/sync-wave: {}
              f:argocd.argoproj.io/tracking-id: {}
              f:kubectl.kubernetes.io/last-applied-configuration: {}
          f:spec:
            .: {}
            f:encryption: {}
            f:externalStorage:
              .: {}
              f:enable: {}
            f:labelSelector: {}
            f:managedResources:
              .: {}
              f:cephBlockPools:
                .: {}
                f:disableStorageClass: {}
              f:cephFilesystems: {}
              f:cephObjectStoreUsers: {}
              f:cephObjectStores: {}
        manager: argocd-controller
        operation: Update
        time: '2025-05-09T22:17:38Z'
      - apiVersion: ocs.openshift.io/v1
        fieldsType: FieldsV1
        fieldsV1:
          f:metadata:
            f:annotations:
              f:uninstall.ocs.openshift.io/cleanup-policy: {}
              f:uninstall.ocs.openshift.io/mode: {}
            f:finalizers:
              .: {}
              v:"storagecluster.ocs.openshift.io": {}
          f:spec:
            f:arbiter: {}
            f:encryption:
              f:keyRotation:
                .: {}
                f:schedule: {}
              f:kms: {}
            f:managedResources:
              f:cephCluster: {}
              f:cephConfig: {}
              f:cephDashboard: {}
              f:cephFilesystems:
                f:dataPoolSpec:
                  .: {}
                  f:application: {}
                  f:erasureCoded:
                    .: {}
                    f:codingChunks: {}
                    f:dataChunks: {}
                  f:mirroring: {}
                  f:quotas: {}
                  f:replicated:
                    .: {}
                    f:size: {}
                  f:statusCheck:
                    .: {}
                    f:mirror: {}
              f:cephNonResilientPools:
                .: {}
                f:count: {}
                f:resources: {}
                f:volumeClaimTemplate:
                  .: {}
                  f:metadata: {}
                  f:spec:
                    .: {}
                    f:resources: {}
                  f:status: {}
              f:cephRBDMirror:
                .: {}
                f:daemonCount: {}
              f:cephToolbox: {}
            f:mirroring: {}
        manager: ocs-operator
        operation: Update
        time: '2025-05-09T22:17:53Z'
      - apiVersion: ocs.openshift.io/v1
        fieldsType: FieldsV1
        fieldsV1:
          f:metadata:
            f:ownerReferences:
              .: {}
              k:{"uid":"32716c3a-0d26-446c-afc4-4d26d2333861"}: {}
        manager: manager
        operation: Update
        time: '2025-05-09T22:24:02Z'
      - apiVersion: ocs.openshift.io/v1
        fieldsType: FieldsV1
        fieldsV1:
          f:status:
            .: {}
            f:conditions: {}
            f:externalSecretHash: {}
            f:images:
              .: {}
              f:ceph:
                .: {}
                f:desiredImage: {}
              f:noobaaCore:
                .: {}
                f:actualImage: {}
                f:desiredImage: {}
              f:noobaaDB:
                .: {}
                f:actualImage: {}
                f:desiredImage: {}
            f:kmsServerConnection: {}
            f:phase: {}
            f:relatedObjects: {}
            f:version: {}
        manager: ocs-operator
        operation: Update
        subresource: status
        time: '2025-05-12T14:55:06Z'
      name: ocs-external-storagecluster
      namespace: openshift-storage
      ownerReferences:
      - apiVersion: odf.openshift.io/v1alpha1
        kind: StorageSystem
        name: ocs-external-storagecluster-storagesystem
        uid: 32716c3a-0d26-446c-afc4-4d26d2333861
      resourceVersion: '3844969'
      uid: f037543a-9b3b-4733-ae49-2fb82f8342cb
    spec:
      arbiter: {}
      encryption:
        keyRotation:
          schedule: '@weekly'
        kms: {}
      externalStorage:
        enable: true
      labelSelector: {}
      managedResources:
        cephBlockPools:
          disableStorageClass: true
        cephCluster: {}
        cephConfig: {}
        cephDashboard: {}
        cephFilesystems:
          dataPoolSpec:
            application: ''
            erasureCoded:
              codingChunks: 0
              dataChunks: 0
            mirroring: {}
            quotas: {}
            replicated:
              size: 0
            statusCheck:
              mirror: {}
        cephNonResilientPools:
          count: 1
          resources: {}
          volumeClaimTemplate:
            metadata: {}
            spec:
              resources: {}
            status: {}
        cephObjectStoreUsers: {}
        cephObjectStores: {}
        cephRBDMirror:
          daemonCount: 1
        cephToolbox: {}
      mirroring: {}
    status:
      conditions:
      - lastHeartbeatTime: '2025-05-09T22:17:53Z'
        lastTransitionTime: '2025-05-09T22:17:53Z'
        message: Version check successful
        reason: VersionMatched
        status: 'False'
        type: VersionMismatch
      - lastHeartbeatTime: '2025-05-12T14:55:05Z'
        lastTransitionTime: '2025-05-09T22:24:03Z'
        message: Reconcile completed successfully
        reason: ReconcileCompleted
        status: 'True'
        type: ReconcileComplete
      - lastHeartbeatTime: '2025-05-12T14:55:05Z'
        lastTransitionTime: '2025-05-09T22:24:03Z'
        message: Reconcile completed successfully
        reason: ReconcileCompleted
        status: 'True'
        type: Available
      - lastHeartbeatTime: '2025-05-12T14:55:05Z'
        lastTransitionTime: '2025-05-09T22:24:03Z'
        message: Reconcile completed successfully
        reason: ReconcileCompleted
        status: 'False'
        type: Progressing
      - lastHeartbeatTime: '2025-05-12T14:55:05Z'
        lastTransitionTime: '2025-05-09T22:17:53Z'
        message: Reconcile completed successfully
        reason: ReconcileCompleted
        status: 'False'
        type: Degraded
      - lastHeartbeatTime: '2025-05-12T14:55:05Z'
        lastTransitionTime: '2025-05-09T22:24:03Z'
        message: Reconcile completed successfully
        reason: ReconcileCompleted
        status: 'True'
        type: Upgradeable
      externalSecretHash: 079933254724f79fabf52bab5d2bef9858f030248d97d018767adfbad58a040e899cab38cbbcb8634c1b239b9c5e8b26b5e56a5cc964ca6c907143ef8fc01539
      images:
        ceph:
          desiredImage: registry.redhat.io/rhceph/rhceph-7-rhel9@sha256:74f12deed91db0e478d5801c08959e451e0dbef427497badef7a2d8829631882
        noobaaCore:
          actualImage: registry.redhat.io/odf4/mcg-core-rhel9@sha256:0176d3ecd09d375ccfce03657b47b0d597131991ebd16f903171248fee383a6c
          desiredImage: registry.redhat.io/odf4/mcg-core-rhel9@sha256:0176d3ecd09d375ccfce03657b47b0d597131991ebd16f903171248fee383a6c
        noobaaDB:
          actualImage: registry.redhat.io/rhel9/postgresql-15@sha256:fbed5b1292e8f08e47d4fffe55cce6e51519082c3c7f80736d0082f90292a3f6
          desiredImage: registry.redhat.io/rhel9/postgresql-15@sha256:fbed5b1292e8f08e47d4fffe55cce6e51519082c3c7f80736d0082f90292a3f6
      kmsServerConnection: {}
      phase: Ready
      relatedObjects:
      - apiVersion: ceph.rook.io/v1
        kind: CephCluster
        name: ocs-external-storagecluster-cephcluster
        namespace: openshift-storage
        resourceVersion: '3844949'
        uid: ec3cd79b-edfb-48b7-9c11-6e43add7b76d
      - apiVersion: noobaa.io/v1alpha1
        kind: NooBaa
        name: noobaa
        namespace: openshift-storage
        resourceVersion: '3844799'
        uid: c54be751-ceb7-4ef5-99fb-a65297cb90e6
      version: 4.16.9

TASK [ibm.mas_devops.cos : OCS cluster status] *********************************
ok: [localhost] => changed=false 
  ansible_facts:
    ocsavailable: true

TASK [ibm.mas_devops.cos : Fail if OCS Cluster is not provided] ****************
ok: [localhost] => changed=false 
  msg: All assertions passed

TASK [ibm.mas_devops.cos : Print If there's OCS cluster] ***********************
ok: [localhost] => 
  msg:
  - OCS Cluster is available .... True

TASK [ibm.mas_devops.cos : ocs/objectstorage : Create objectstore in OSC Cluster] ***
[WARNING]: unknown field "spec.gateway.allNodes"
[WARNING]: unknown field "spec.gateway.type"
changed: [localhost] => changed=true 
  method: apply
  result:
    apiVersion: ceph.rook.io/v1
    kind: CephObjectStore
    metadata:
      annotations:
        kubectl.kubernetes.io/last-applied-configuration: '{"apiVersion":"ceph.rook.io/v1","kind":"CephObjectStore","metadata":{"name":"object","namespace":"openshift-storage"},"spec":{"dataPool":{"failureDomain":"host","replicated":{"size":2}},"gateway":{"allNodes":false,"instances":1,"placement":null,"port":8081,"resources":null,"securePort":null,"sslCertificateRef":null,"type":"s3"},"metadataPool":{"failureDomain":"host","replicated":{"size":2}}}}'
      creationTimestamp: '2025-05-12T14:55:29Z'
      generation: 1
      managedFields:
      - apiVersion: ceph.rook.io/v1
        fieldsType: FieldsV1
        fieldsV1:
          f:metadata:
            f:annotations:
              .: {}
              f:kubectl.kubernetes.io/last-applied-configuration: {}
          f:spec:
            .: {}
            f:dataPool:
              .: {}
              f:failureDomain: {}
              f:replicated:
                .: {}
                f:size: {}
            f:gateway:
              .: {}
              f:instances: {}
              f:placement: {}
              f:port: {}
              f:resources: {}
              f:securePort: {}
              f:sslCertificateRef: {}
            f:metadataPool:
              .: {}
              f:failureDomain: {}
              f:replicated:
                .: {}
                f:size: {}
        manager: OpenAPI-Generator
        operation: Update
        time: '2025-05-12T14:55:29Z'
      name: object
      namespace: openshift-storage
      resourceVersion: '3845547'
      uid: f647c2eb-87c8-4226-a15e-b3055b9188c1
    spec:
      dataPool:
        failureDomain: host
        replicated:
          size: 2
      gateway:
        instances: 1
        placement: null
        port: 8081
        resources: null
        securePort: null
        sslCertificateRef: null
      metadataPool:
        failureDomain: host
        replicated:
          size: 2

TASK [ibm.mas_devops.cos : ocs/objectstorage : Create objectstore User] ********
changed: [localhost] => changed=true 
  method: apply
  result:
    apiVersion: ceph.rook.io/v1
    kind: CephObjectStoreUser
    metadata:
      annotations:
        kubectl.kubernetes.io/last-applied-configuration: '{"apiVersion":"ceph.rook.io/v1","kind":"CephObjectStoreUser","metadata":{"name":"object","namespace":"openshift-storage"},"spec":{"displayName":"s3-user3","store":"object"}}'
      creationTimestamp: '2025-05-12T14:55:30Z'
      generation: 1
      managedFields:
      - apiVersion: ceph.rook.io/v1
        fieldsType: FieldsV1
        fieldsV1:
          f:metadata:
            f:annotations:
              .: {}
              f:kubectl.kubernetes.io/last-applied-configuration: {}
          f:spec:
            .: {}
            f:displayName: {}
            f:store: {}
        manager: OpenAPI-Generator
        operation: Update
        time: '2025-05-12T14:55:30Z'
      name: object
      namespace: openshift-storage
      resourceVersion: '3845573'
      uid: 83582315-2f59-40da-8f8f-f7ef53e57a86
    spec:
      displayName: s3-user3
      store: object

TASK [ibm.mas_devops.cos : Wait for Ceph os user to be ready (60s delay)] ******
FAILED - RETRYING: [localhost]: Wait for Ceph os user to be ready (60s delay) (10 retries left).
FAILED - RETRYING: [localhost]: Wait for Ceph os user to be ready (60s delay) (9 retries left).
FAILED - RETRYING: [localhost]: Wait for Ceph os user to be ready (60s delay) (8 retries left).
FAILED - RETRYING: [localhost]: Wait for Ceph os user to be ready (60s delay) (7 retries left).
FAILED - RETRYING: [localhost]: Wait for Ceph os user to be ready (60s delay) (6 retries left).
FAILED - RETRYING: [localhost]: Wait for Ceph os user to be ready (60s delay) (5 retries left).
FAILED - RETRYING: [localhost]: Wait for Ceph os user to be ready (60s delay) (4 retries left).
FAILED - RETRYING: [localhost]: Wait for Ceph os user to be ready (60s delay) (3 retries left).
FAILED - RETRYING: [localhost]: Wait for Ceph os user to be ready (60s delay) (2 retries left).
FAILED - RETRYING: [localhost]: Wait for Ceph os user to be ready (60s delay) (1 retries left).
fatal: [localhost]: FAILED! => changed=false 
  api_found: true
  attempts: 10
  resources:
  - apiVersion: ceph.rook.io/v1
    kind: CephObjectStoreUser
    metadata:
      annotations:
        kubectl.kubernetes.io/last-applied-configuration: '{"apiVersion":"ceph.rook.io/v1","kind":"CephObjectStoreUser","metadata":{"name":"object","namespace":"openshift-storage"},"spec":{"displayName":"s3-user3","store":"object"}}'
      creationTimestamp: '2025-05-12T14:55:30Z'
      finalizers:
      - cephobjectstoreuser.ceph.rook.io
      generation: 1
      managedFields:
      - apiVersion: ceph.rook.io/v1
        fieldsType: FieldsV1
        fieldsV1:
          f:metadata:
            f:annotations:
              .: {}
              f:kubectl.kubernetes.io/last-applied-configuration: {}
          f:spec:
            .: {}
            f:displayName: {}
            f:store: {}
        manager: OpenAPI-Generator
        operation: Update
        time: '2025-05-12T14:55:30Z'
      - apiVersion: ceph.rook.io/v1
        fieldsType: FieldsV1
        fieldsV1:
          f:metadata:
            f:finalizers:
              .: {}
              v:"cephobjectstoreuser.ceph.rook.io": {}
        manager: rook
        operation: Update
        time: '2025-05-12T14:55:30Z'
      - apiVersion: ceph.rook.io/v1
        fieldsType: FieldsV1
        fieldsV1:
          f:status:
            .: {}
            f:phase: {}
        manager: rook
        operation: Update
        subresource: status
        time: '2025-05-12T14:55:31Z'
      name: object
      namespace: openshift-storage
      resourceVersion: '3845604'
      uid: 83582315-2f59-40da-8f8f-f7ef53e57a86
    spec:
      displayName: s3-user3
      store: object
    status:
      phase: ReconcileFailed

NO MORE HOSTS LEFT *************************************************************

PLAY RECAP *********************************************************************
localhost                  : ok=11   changed=2    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0

marekpolujan avatar May 12 '25 15:05 marekpolujan

Case TS019247012 created with IBM Techzone support, and they responded that :

The ODF Object Storage errors have been cleaned. Did user create this CephObjectStore? It is not compatible with our OCP-V external ODF and the operator was unable to reconcile it. I had to remove it

marekpolujan avatar May 12 '25 19:05 marekpolujan