azure-cli icon indicating copy to clipboard operation
azure-cli copied to clipboard

CLI Support for ResiliencyPolicy (including ResilientVMCreationPolicy, ResilientVMDeletionPolicy) on VMSS

Open hilaryw29 opened this issue 1 year ago • 4 comments

Preconditions

  • [X] No need to upgrade Python SDK or the Python SDK is ready.

Related command

No response

Resource Provider

Microsoft.Compute/ComputeRP

Description of Feature or Work Requested

Our ask is to have CLI support for ResiliencyPolicy (including ResilientVMCreationPolicy, ResilientVMDeletionPolicy) which is a new feature on VMSS.

Resilient VM Create will automatically recover customers from OS Provisioning Timeout and VM Start Timeout errors experienced during a VM Create operation by deleting and recreating the affected VM. Resilient VM Delete will retry VM Delete requests asynchronously in the event of a failed delete operation. This feature will be available on VMSS Uniform and Flex.

API Design:

{ 
    "properties": { 
        "ResiliencyPolicy": { 
          "ResilientVMCreationPolicy": {
                 "Enabled": true 
           }, 
           "ResilientVMDeletionPolicy":{
              "Enabled": true
            } 
       }
    }
}

If Enabled == true for ResilientVMCreationPolicy, VMSS will automatically delete + recreate when an instance goes into OS Provisioning Timeout or VM Start Timeout during the creation process. VMSS will retry the create operation for a set retryLimit (configured into the backend, not exposed to customers). If Enabled == false, the instance will fall into "Provisioning Failed" state if OS Provisioning Timeout or VM Start Timeout is encountered during the VM create process.

If Enabled == true for ResilientVMDeletionPolicy, VMSS will automatically retry delete operations asynchronously if the initial VM deletion fails. If Enabled == false, the VM will fall into a "Failed"/unusable state if delete operation fails.

Dev Spec, Resilient VM Create: https://microsoft.sharepoint.com/:w:/t/ComputeVM/EWXZuEBUcm9Gvl6e7e0PlcQBAMwkAODWW1b5Z5xDdf1GuA?e=Qy8Vzb

Dev Spec, Resilient VM Delete: Reliable VM Deletion.docx

PM Spec: https://microsoft.sharepoint.com/:w:/t/AzureComputeInfrastructurePlatformTeam/EZNWFEW6xONLiWMFq33asfoB_FfF0YwjVbuBg-xCQwzbXA?e=44GuXz

Minimum API Version Required

2023-09-01

Swagger PR link / SDK link

https://github.com/Azure/azure-rest-api-specs/blob/c75671ae0a5d82d7d96fda75150478e8581408d8/specification/compute/resource-manager/Microsoft.Compute/ComputeRP/stable/2023-09-01/virtualMachineScaleSet.json#L3914](https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2Fazure-rest-api-specs%2Fblob%2Fc75671ae0a5d82d7d96fda75150478e8581408d8%2Fspecification%2Fcompute%2Fresource-manager%2FMicrosoft.Compute%2FComputeRP%2Fstable%2F2023-09-01%2FvirtualMachineScaleSet.json%23L3914&data=05%7C02%7Chilarywang%40microsoft.com%7Cd317c22e543441a7c45008dc34d195bd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638443321975559884%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=g5FAWylQ%2FZJyk60bQfzf%2FT4w9GPwXaHiBwCSMY5X%2F0s%3D&reserved=0

Request Example

No response

Target Date

2024-06-21

PM Contact

hilarywang

Engineer Contact

raredd

Additional context

No response

hilaryw29 avatar Feb 24 '24 01:02 hilaryw29

Thank you for opening this issue, we will look into it.

yonzhan avatar Feb 24 '24 01:02 yonzhan

Target Date 2024-06-21

@hilaryw29 May I ask if this ETA means that we need to complete it before 2024-06-21, right? Or does it mean that it must be completed near 2024-06-21?

zhoxing-ms avatar Feb 26 '24 03:02 zhoxing-ms

@zhoxing-ms We are now targeting public preview in May so if it's possible to be completed before 2024-05-15 that would be great. Otherwise, before 2024-06-21 was my understanding of the proposed ETA. Thanks!

hilaryw29 avatar Apr 02 '24 16:04 hilaryw29

@hilaryw29 I was working on this feature and encountered some problems when creating test cases. {"code":"InvalidParameter","target":"resilientVMCreationPolicy","message":"The value of parameter resilientVMCreationPolicy is invalid."} The value of resilientVMCreationPolicy in template is "resiliencyPolicy": {"resilientVMCreationPolicy": {"enabled": true}}.

The x-ms-client-request-id is 0ca85d89-0dda-11ef-b537-d08e79002336 and the x-ms-request-id is bd682a71-730f-46ba-8167-98edc4d9ff3a.

The log of the related request:

cli.azure.cli.core.sdk.policies: Request URL: 'https://management.azure.com/subscriptions/0b1f6471-1bf0-4dda-aec3-cb9272f09590/resourcegroups/qinkai-test/providers/Microsoft.Resources/deployments/vmss_deploy_aGiadnMHBflAWT19YgWEcKU2DFEwyex1?api-version=2022-09-01'
cli.azure.cli.core.sdk.policies: Request method: 'PUT'
cli.azure.cli.core.sdk.policies: Request headers:
cli.azure.cli.core.sdk.policies:     'Content-Type': 'application/json'
cli.azure.cli.core.sdk.policies:     'Content-Length': '4635'
cli.azure.cli.core.sdk.policies:     'Accept': 'application/json'
cli.azure.cli.core.sdk.policies:     'x-ms-client-request-id': '0ca85d89-0dda-11ef-b537-d08e79002336'
cli.azure.cli.core.sdk.policies:     'CommandName': 'vmss create'
cli.azure.cli.core.sdk.policies:     'ParameterSetName': '--debug -g -n --image --vm-sku --admin-username --admin-password --enable-resilient-vm-creation --upgrade-policy-mode --orchestration-mode'
cli.azure.cli.core.sdk.policies:     'User-Agent': 'AZURECLI/2.60.0 (PIP) azsdk-python-core/1.28.0 Python/3.10.11 (Windows-10-10.0.22631-SP0)'
cli.azure.cli.core.sdk.policies:     'Authorization': '*****'
cli.azure.cli.core.sdk.policies: Request body:
cli.azure.cli.core.sdk.policies: {"properties": {"template": {"$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#", "contentVersion": "1.0.0.0", "parameters": {"adminPassword": {"type": "securestring", "metadata": {"description": "Secure adminPassword"}}}, "variables"
: {}, "resources": [{"apiVersion": "2022-01-01", "type": "Microsoft.Network/publicIPAddresses", "name": "my2vmssLBPublicIP", "location": "eastus2", "tags": {}, "dependsOn": [], "properties": {"publicIPAllocationMethod": "Static"}, "sku": {"name": "Standard"}}, {"type": "Microsoft.Network/loadBalancers", "na
me": "my2vmssLB", "location": "eastus2", "tags": {}, "apiVersion": "2022-01-01", "dependsOn": ["Microsoft.Network/publicIpAddresses/my2vmssLBPublicIP"], "properties": {"backendAddressPools": [{"name": "my2vmssLBBEPool"}], "frontendIPConfigurations": [{"name": "loadBalancerFrontEnd", "properties": {"publicIP
Address": {"id": "/subscriptions/0b1f6471-1bf0-4dda-aec3-cb9272f09590/resourceGroups/qinkai-test/providers/Microsoft.Network/publicIPAddresses/my2vmssLBPublicIP"}}}], "loadBalancingRules": [{"name": "LBRule", "properties": {"frontendIPConfiguration": {"id": "[concat(resourceId('Microsoft.Network/loadBalance
rs', 'my2vmssLB'), '/frontendIPConfigurations/', 'loadBalancerFrontEnd')]"}, "backendAddressPool": {"id": "[concat(resourceId('Microsoft.Network/loadBalancers', 'my2vmssLB'), '/backendAddressPools/', 'my2vmssLBBEPool')]"}, "protocol": "tcp", "frontendPort": 80, "backendPort": 80, "enableFloatingIP": false, 
"idleTimeoutInMinutes": 5}}]}, "sku": {"name": "Standard"}}, {"type": "Microsoft.Network/networkSecurityGroups", "name": "my2vmssNSG", "apiVersion": "2015-06-15", "location": "eastus2", "tags": {}, "dependsOn": [], "properties": {"securityRules": [{"name": "default-allow-ssh", "properties": {"protocol": "Tc
p", "sourcePortRange": "*", "destinationPortRange": "22", "sourceAddressPrefix": "*", "destinationAddressPrefix": "*", "access": "Allow", "priority": 1000, "direction": "Inbound"}}]}}, {"type": "Microsoft.Network/loadBalancers/inboundNatRules", "apiVersion": "2022-01-01", "name": "my2vmssLB/NatRule", "locat
ion": "eastus2", "properties": {"frontendIPConfiguration": {"id": "[concat(resourceId('Microsoft.Network/loadBalancers', 'my2vmssLB'), '/frontendIPConfigurations/', 'loadBalancerFrontEnd')]"}, "backendAddressPool": {"id": "[concat(resourceId('Microsoft.Network/loadBalancers', 'my2vmssLB'), '/backendAddressP
ools/', 'my2vmssLBBEPool')]"}, "backendPort": 3389, "frontendPortRangeStart": "50000", "frontendPortRangeEnd": "50119", "protocol": "tcp", "idleTimeoutInMinutes": 5}, "dependsOn": ["[concat('Microsoft.Network/loadBalancers/', 'my2vmssLB')]"]}, {"type": "Microsoft.Compute/virtualMachineScaleSets", "name": "m
y2vmss", "location": "eastus2", "tags": {}, "apiVersion": "2024-03-01", "dependsOn": ["Microsoft.Network/loadBalancers/my2vmssLB", "Microsoft.Network/networkSecurityGroups/my2vmssNSG"], "properties": {"overprovision": true, "upgradePolicy": {"mode": "Manual", "rollingUpgradePolicy": {}, "automaticOSUpgradeP
olicy": {}}, "singlePlacementGroup": null, "resiliencyPolicy": {"resilientVMCreationPolicy": {"enabled": true}}, "virtualMachineProfile": {"storageProfile": {"osDisk": {"createOption": "FromImage", "caching": "ReadWrite", "managedDisk": {"storageAccountType": null}}, "imageReference": {"publisher": "Microso
ftWindowsServer", "offer": "WindowsServer", "sku": "2016-Datacenter", "version": "latest"}}, "osProfile": {"computerNamePrefix": "my2vm87aa", "adminUsername": "vmtest", "adminPassword": "[parameters('adminPassword')]"}, "networkProfile": {"networkInterfaceConfigurations": [{"name": "my2vm87aaNic", "properti
es": {"ipConfigurations": [{"name": "my2vm87aaIPConfig", "properties": {"subnet": {"id": "/subscriptions/0b1f6471-1bf0-4dda-aec3-cb9272f09590/resourceGroups/qinkai-test/providers/Microsoft.Network/virtualNetworks/myvmssVNET/subnets/myvmssSubnet"}, "loadBalancerBackendAddressPools": [{"id": "/subscriptions/0
b1f6471-1bf0-4dda-aec3-cb9272f09590/resourceGroups/qinkai-test/providers/Microsoft.Network/loadBalancers/my2vmssLB/backendAddressPools/my2vmssLBBEPool"}]}}], "networkSecurityGroup": {"id": "[resourceId('Microsoft.Network/networkSecurityGroups', 'my2vmssNSG')]"}, "primary": "true"}}]}}, "orchestrationMode": 
"Uniform"}, "sku": {"name": "Standard_D1_v2", "capacity": 2}}], "outputs": {"VMSS": {"type": "object", "value": "[reference(resourceId('Microsoft.Compute/virtualMachineScaleSets', 'my2vmss'),providers('Microsoft.Compute', 'virtualMachineScaleSets').apiVersions[0])]"}}}, "parameters": {"adminPassword": {"value": "Test123456789#"}}, "mode": "incremental"}}
urllib3.connectionpool: Starting new HTTPS connection (1): management.azure.com:443
urllib3.connectionpool: https://management.azure.com:443 "PUT /subscriptions/0b1f6471-1bf0-4dda-aec3-cb9272f09590/resourcegroups/qinkai-test/providers/Microsoft.Resources/deployments/vmss_deploy_aGiadnMHBflAWT19YgWEcKU2DFEwyex1?api-version=2022-09-01 HTTP/1.1" 201 2670
cli.azure.cli.core.sdk.policies: Response status: 201
cli.azure.cli.core.sdk.policies: Response headers:
cli.azure.cli.core.sdk.policies:     'Cache-Control': 'no-cache'
cli.azure.cli.core.sdk.policies:     'Pragma': 'no-cache'
cli.azure.cli.core.sdk.policies:     'Content-Length': '2670'
cli.azure.cli.core.sdk.policies:     'Content-Type': 'application/json; charset=utf-8'
cli.azure.cli.core.sdk.policies:     'Expires': '-1'
cli.azure.cli.core.sdk.policies:     'Azure-AsyncOperation': 'https://management.azure.com/subscriptions/0b1f6471-1bf0-4dda-aec3-cb9272f09590/resourcegroups/qinkai-test/providers/Microsoft.Resources/deployments/vmss_deploy_aGiadnMHBflAWT19YgWEcKU2DFEwyex1/operationStatuses/08584863653008915274?api-version=2022-09-01'
cli.azure.cli.core.sdk.policies:     'x-ms-ratelimit-remaining-subscription-writes': '1199'
cli.azure.cli.core.sdk.policies:     'x-ms-request-id': 'bd682a71-730f-46ba-8167-98edc4d9ff3a'
cli.azure.cli.core.sdk.policies:     'x-ms-correlation-request-id': 'bd682a71-730f-46ba-8167-98edc4d9ff3a'
cli.azure.cli.core.sdk.policies:     'x-ms-routing-request-id': 'SOUTHEASTASIA:20240509T075951Z:bd682a71-730f-46ba-8167-98edc4d9ff3a'
cli.azure.cli.core.sdk.policies:     'Strict-Transport-Security': 'max-age=31536000; includeSubDomains'
cli.azure.cli.core.sdk.policies:     'X-Content-Type-Options': 'nosniff'
cli.azure.cli.core.sdk.policies:     'X-Cache': 'CONFIG_NOCACHE'
cli.azure.cli.core.sdk.policies:     'X-MSEdge-Ref': 'Ref A: 09DF0CD291B44E0CB7113674C7705268 Ref B: MAA201060514025 Ref C: 2024-05-09T07:59:37Z'
cli.azure.cli.core.sdk.policies:     'Date': 'Thu, 09 May 2024 07:59:51 GMT'
cli.azure.cli.core.sdk.policies: Response content:
cli.azure.cli.core.sdk.policies: {"id":"/subscriptions/0b1f6471-1bf0-4dda-aec3-cb9272f09590/resourceGroups/qinkai-test/providers/Microsoft.Resources/deployments/vmss_deploy_aGiadnMHBflAWT19YgWEcKU2DFEwyex1","name":"vmss_deploy_aGiadnMHBflAWT19YgWEcKU2DFEwyex1","type":"Microsoft.Resources/deployments","prope
rties":{"templateHash":"4682945623004166168","parameters":{"adminPassword":{"type":"SecureString"}},"mode":"Incremental","provisioningState":"Accepted","timestamp":"2024-05-09T07:59:48.5520615Z","duration":"PT0.0003417S","correlationId":"bd682a71-730f-46ba-8167-98edc4d9ff3a","providers":[{"namespace":"Micro
soft.Network","resourceTypes":[{"resourceType":"publicIPAddresses","locations":["eastus2"]},{"resourceType":"loadBalancers","locations":["eastus2"]},{"resourceType":"networkSecurityGroups","locations":["eastus2"]},{"resourceType":"loadBalancers/inboundNatRules","locations":["eastus2"]}]},{"namespace":"Micro
soft.Compute","resourceTypes":[{"resourceType":"virtualMachineScaleSets","locations":["eastus2"]}]}],"dependencies":[{"dependsOn":[{"id":"/subscriptions/0b1f6471-1bf0-4dda-aec3-cb9272f09590/resourceGroups/qinkai-test/providers/Microsoft.Network/publicIPAddresses/my2vmssLBPublicIP","resourceType":"Microsoft.
Network/publicIPAddresses","resourceName":"my2vmssLBPublicIP"}],"id":"/subscriptions/0b1f6471-1bf0-4dda-aec3-cb9272f09590/resourceGroups/qinkai-test/providers/Microsoft.Network/loadBalancers/my2vmssLB","resourceType":"Microsoft.Network/loadBalancers","resourceName":"my2vmssLB"},{"dependsOn":[{"id":"/subscri
ptions/0b1f6471-1bf0-4dda-aec3-cb9272f09590/resourceGroups/qinkai-test/providers/Microsoft.Network/loadBalancers/my2vmssLB","resourceType":"Microsoft.Network/loadBalancers","resourceName":"my2vmssLB"}],"id":"/subscriptions/0b1f6471-1bf0-4dda-aec3-cb9272f09590/resourceGroups/qinkai-test/providers/Microsoft.N
etwork/loadBalancers/my2vmssLB/inboundNatRules/NatRule","resourceType":"Microsoft.Network/loadBalancers/inboundNatRules","resourceName":"my2vmssLB/NatRule"},{"dependsOn":[{"id":"/subscriptions/0b1f6471-1bf0-4dda-aec3-cb9272f09590/resourceGroups/qinkai-test/providers/Microsoft.Network/loadBalancers/my2vmssLB
","resourceType":"Microsoft.Network/loadBalancers","resourceName":"my2vmssLB"},{"id":"/subscriptions/0b1f6471-1bf0-4dda-aec3-cb9272f09590/resourceGroups/qinkai-test/providers/Microsoft.Network/networkSecurityGroups/my2vmssNSG","resourceType":"Microsoft.Network/networkSecurityGroups","resourceName":"my2vmssNSG"}],"id":"/subscriptions/0b1f6471-1bf0-4dda-aec3-cb9272f09590/resourceGroups/qinkai-test/providers/Microsoft.Compute/virtualMachineScaleSets/my2vmss","resourceType":"Microsoft.Compute/virtualMachineScaleSets","resourceName":"my2vmss"}]}}

ReaNAiveD avatar May 09 '24 08:05 ReaNAiveD