azure_arc icon indicating copy to clipboard operation
azure_arc copied to clipboard

Custom deployment failed

Open potejasw opened this issue 9 months ago • 10 comments

**Is your issue related to a Jumpstart scenario, , HCIBox

Describe the issue or the bug OperationTimeout , No updates received from device for operation

{"code":"ArcOperationTimedOut","target":"/subscriptions/3f3df5ee-74f3-4aa8-83d2-fa6558733b45/resourceGroups/PCTHCIBOX-rg/providers/Microsoft.HybridCompute/machines/AzSHOST1","message":"OperationTimeout , No updates received from device for operation: [providers/microsoft.azurestackhci/locations/EASTUS/operationStatuses/98438b4f-e55d-4580-9649-82be41c323d9*E803284F08085E1E43A65AF9A5F9852A3E3D07A9C81917EE350B85B2BFC1CABF?api-version=2023-08-01-preview] beyond timeout of [600000] ms"}

Raw error: { "code": "ArcOperationTimedOut", "target": "/subscriptions/3f3df5ee-74f3-4aa8-83d2-fa6558733b45/resourceGroups/PCTHCIBOX-rg/providers/Microsoft.HybridCompute/machines/AzSHOST1", "message": "OperationTimeout , No updates received from device for operation: [providers/microsoft.azurestackhci/locations/EASTUS/operationStatuses/98438b4f-e55d-4580-9649-82be41c323d9*E803284F08085E1E43A65AF9A5F9852A3E3D07A9C81917EE350B85B2BFC1CABF?api-version=2023-08-01-preview] beyond timeout of [600000] ms" }

To Reproduce

Expected behavior Complete the custom deployment of HCI box.

Environment summary Az HCI 23H2

Have you looked at the Troubleshooting and Logs section?

Screenshots

image

image

Additional context HCI deployment.

potejasw avatar May 03 '24 13:05 potejasw

Hi Team, you have any update or any engineer assigned?

potejasw avatar May 06 '24 01:05 potejasw

Hi @potejasw, thx for opening the issue. We will have someone assigned to this in a few days as we currently getting ready for a few major releases. Thx for your patience and understanding.

likamrat avatar May 06 '24 21:05 likamrat

I also wanted to add that trying out the HCIBox Jumpstart using CLI option, is failing after a few hours running the New-HCIBoxCluster.ps1 script at logon. Step 10 fails upon Validation, and the error message in the portal is the following (note I removed my resource names)

{"code":"UpdateDeploymentSettingsDataFailed","message":"Deployment Settings validation failed.","details":
[{"code":"UpdateDeploymentSettingsDataFailed","target":"/subscriptions/[.......]/resourceGroups/[.......]/providers/Microsoft.AzureStackHCI/clusters/hciboxcluster","message":"Failed to create deployment settings. \nValidation status is {Status=Error, Steps={Name=Error, Description=Error executing Request: Validate, FullStepIndex=0, StartTimeUtc=5/7/2024 4:17:38 PM, EndTimeUtc=NA, Status=Error, Exception=Exception: One or more errors occurred. at:   at 
System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)\r\n   at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)\r\n   at Microsoft.AzureStack.Solution.Deploy.LCMController.ArcCommunication.ActionPlanController.ExecuteRequest(Request request) in 
C:\\__w\\1\\s\\src\\LCMController\\ArcCommunication\\Source\\LCMController.ArcCommunication\\ActionPlanController.cs:line 379 Base Exception: Failed to fetch secret:LocalAdminCredential
 from Key Vault https://[[.......]].vault.azure.net with:Response status code does not indicate success: 404 (Not Found). at:  
  at Microsoft.AzureStack.Solution.Deploy.LCMController.ArcCommunication.ActionPlanController.GetSecret(String keyVaultUri, String secretName) in C:\\__w\\1\\s\\src\\LCMController\\ArcCommunication\\Source\\LCMController.ArcCommunication\\ActionPlanController.cs:line 296\r\n   at Microsoft.AzureStack.Solution.Deploy.LCMController.ArcCommunication.ActionPlanController.<InitAnswerFileAndSecrets>d__9.MoveNext() in 
C:\\__w\\1\\s\\src\\LCMController\\ArcCommunication\\Source\\LCMController.ArcCommunication\\ActionPlanController.cs:line 253\r\n--- End of stack trace from previous location where exception was thrown ---\r\n   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()\r\n   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)\r\n   at Microsoft.AzureStack.Solution.Deploy.LCMController.ArcCommunication.ActionPlanController.<ExecuteMessagesFromResourceProvider>d__5.MoveNext() in C:\\__w\\1\\s\\src\\LCMController\\ArcCommunication\\Source\\LCMController.ArcCommunication\\ActionPlanController.cs:line 94, Steps=null}}. \nDeployment Status is {Status=, Steps=null}"}]}

The secret LocalAdminCredential does exist in the Key Vault.

katriendg avatar May 08 '24 13:05 katriendg

How do I try using CLI option. We have below two options to create cluster. Arm template and Azure portal.

potejasw avatar May 09 '24 05:05 potejasw

@potejasw To clarify I meant the Azure CLI tutorial (which deploys through Bicep/Arm) and not the Azure Developer CLI one. https://azurearcjumpstart.io/azure_jumpstart_hcibox/deployment_az

katriendg avatar May 09 '24 08:05 katriendg

@katriendg The HCI box deployment completed. I can login to the VM. But I have an issue in creating the Cluster from ARM template.

potejasw avatar May 09 '24 08:05 potejasw

@katriendg You got any new to me?

potejasw avatar May 16 '24 08:05 potejasw

@potejasw Could you give the following a try?

On the HCI nodes, navigate to C:\ProgramData\GuestConfig\extension_logs\Microsoft.Edge.DeviceManagementExtension\ and check the DeviceManagementExtension.log and state.json for any error messages. If none are found, rename the EdgeDevice.txt file to EdgeDevice.old, which will regenerate the latest device information and push it up to the cloud within 15 minutes

janegilring avatar May 16 '24 12:05 janegilring

@janegilring I tried the above action plan. Re-tried to deploy the cluster using the ARM template and its failed with below error. image

{"code":"UpdateDeploymentSettingsDataFailed","message":"Deployment Settings validation failed.","details":[{"code":"UpdateDeploymentSettingsDataFailed","target":"/subscriptions/xxxxxxxxxxxxxxx/resourceGroups/xxxxxx-rg/providers/Microsoft.AzureStackHCI/clusters/hciboxcluster","message":"Failed to create deployment settings. \nValidation status is {Status=Error, Steps={Name=SetRegistrationParametersInECEForCloudDeployment, Description=Set Registration parameters in ECE for cloud deployment., FullStepIndex=0, StartTimeUtc=2024-05-20T09:33:31, EndTimeUtc=2024-05-20T09:33:46, Status=Success, Exception=, Steps=}, {Name=InvokeEnvironmentChecker, Description=Invoke Environment Checker action plan., FullStepIndex=1, StartTimeUtc=2024-05-20T09:33:46, EndTimeUtc=2024-05-20T09:33:50, Status=Error, Exception=System.Collections.Generic.List`1[System.String], Steps=}}. \nDeployment Status is {Status=, Steps=null}"}]}

potejasw avatar May 20 '24 09:05 potejasw

@potejasw Thanks for the update. At this point I would suggest deleting the resource group, run git pull in your local Jumpstart-folder and try a fresh deployment.

janegilring avatar May 20 '24 10:05 janegilring

@potejasw Did you need further assistance or can we close this issue?

janegilring avatar May 28 '24 12:05 janegilring

@janegilring Please close this. I think the issue was HCIbox deployment was half baked.

I was able to create a new HCIbox. This solved my purpose.

potejasw avatar May 29 '24 04:05 potejasw