OpenShift icon indicating copy to clipboard operation
OpenShift copied to clipboard

Unable to create ODF/OCS storage cluster on ARO cluster v10

Open maulik-shah999 opened this issue 3 years ago • 3 comments

RedHat Case: https://access.redhat.com/support/cases/#/case/03267534/

What problem/issue/behavior are you having trouble with? What do you expect to see? We are trying to add ODF storage cluster to the ARO cluster following the documentation at https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.10/html/deploying_openshift_data_foundation_using_microsoft_azure_and_azure_red_hat_openshift/deploying-openshift-data-foundation-on-microsoft-azure_azure

But it fails while trying to mount the PVC.

The rook-ceph-mon pods fails to initialize.

rook-ceph-mon-a-5559cf8ccb-79tsr 0/2 Init:0/2 0 5h43m rook-ceph-mon-b-66b95854d-jc2j4 0/2 Init:0/2 0 5h32m rook-ceph-mon-c-5c5958bb6-4bg69 0/2 Init:0/2 0 5h20m

We see these errors in the pod logs

Warning FailedMount 3m22s (x2 over 17m) kubelet Unable to attach or mount volumes: unmounted volumes=[ceph-daemon-data], unattached volumes=[ceph-daemon-data kube-api-access-jr746 rook-config-override rook-ceph-mons-keyring rook-ceph-log rook-ceph-crash]: timed out waiting for the condition

Warning FailedAttachVolume 65s (x16 over 23m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-126648a8-85e0-40da-b1c9-f3fe44e89557" : rpc error: code = NotFound desc = Volume not found, failed with error: Retriable: false, RetryAfter: 0s, HTTPStatusCode: 404, RawError: azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions//resourceGroups/aro-ms-ocs12-aro450/providers/Microsoft.Compute/disks/ms-ocs12-aro450-t4xcz-dynamic-pvc-126648a8-85e0-40da-b1c9-f3fe44e89557?api-version=2021-04-01: StatusCode=404 -- Original Error: adal: Refresh request failed. Status Code = '404'. Response body: Endpoint https://login.microsoftonline.com/oauth2/token Warning FailedMount 64s (x6 over 19m) kubelet Unable to attach or mount volumes: unmounted volumes=[ceph-daemon-data], unattached volumes=[rook-ceph-crash ceph-daemon-data kube-api-access-jr746 rook-config-override rook-ceph-mons-keyring rook-ceph-log]: timed out waiting for the condition

What is the business impact? Please also provide timeframe information. We are not able to install Cloud Pak for Data on ARO

Where are you experiencing the behavior? What environment? Azure RedHat Openshift managed service

When does the behavior occur? Frequency? Repeatedly? At certain times? Always

maulik-shah999 avatar Aug 12 '22 23:08 maulik-shah999

I discussed with the RedHat team and they need more log from the Microsoft team to debug this issue. I really appreciate any help you can provide. Thanks.

maulik-shah999 avatar Aug 15 '22 23:08 maulik-shah999

Hi @maulik-shah999
I recommend working through the support case for this issue, that will get you to resolution the quickest.

Thanks, Jerome

jboutaud avatar Aug 16 '22 15:08 jboutaud

@jboutaud Thanks for your response. I created a support case: 2208160010006487 on the Azure Portal. Can you please update the support team to look into this? Thanks

maulik-shah999 avatar Aug 16 '22 20:08 maulik-shah999

Is there any progress/info about this issue? We've recently run into the same problem trying to make azure-files-csi to work on ARO 4.10.40

bartek-lopatka avatar Mar 28 '23 14:03 bartek-lopatka

For whatever it is worth, and not clear whether this is supported, I was able to create an ODF cluster inside ARO 4.10.54.

nastacio avatar Jun 29 '23 15:06 nastacio

Yes, it does support it. The RedHat team has resolved this issue on the ARO Openshift version 4.10.23 or later. So, you should be able to install ODF in any version after 4.10.23 for 4.10.x series. I don't see any issue in the 4.10.40 and 4.10.54 ARO versions so far.

maulik-shah999 avatar Jun 29 '23 17:06 maulik-shah999