azure-sdk-for-go icon indicating copy to clipboard operation
azure-sdk-for-go copied to clipboard

TemplateClient.Validate in armresource pkg is not functional on error case

Open haitch opened this issue 1 year ago • 9 comments

Bug Report

  • import path of package in question, "github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/resources/armresources

  • SDK version: all (I believe)

  • output of go version 1.20 (doesn't matter)

  • What happened? given following code, almost nothing works in error case

		deployment := armresources.Deployment{
			Location: azcore_to.Ptr(clonedMC.Location),
			Properties: &armresources.DeploymentProperties{
				Mode:       azcore_to.Ptr(armresources.DeploymentModeIncremental),
				Template:   templateMap,
				Parameters: parametersMap,
			},
		}

		poller, requestError := deploymentClient.BeginValidateAtSubscriptionScope(ctx, "dummyDeployName", deployment, nil)
		if requestError != nil {
			return fmt.Errorf("failed to send validateTemplate request with error: %w\n", requestError)
		}

		pollOptions := runtime.PollUntilDoneOptions{
			Frequency: 5 * time.Second, // Pass zero to accept the default value (30s).
		}
		response, pullingError := poller.PollUntilDone(ctx, &pollOptions)
		if err != nil {
			return fmt.Errorf("failed to polling result for validateTemplate with error: %w\n", pullingError)
		}

		validateResult := response.DeploymentValidateResult
		// work on validateResult
		...
  • 400 BadRequest (InvalidTemplate)

  • 400 BadRequest (InvalidTemplateDeployment-DisallowedByPolicy)

    • Microsoft.Resource/Deployment/validate have API design flaw as well, that response validation error as 400
      • After validating my template, and my template have some error
      • instead of return 200, and return error in response payload, they return 400
    • this return as request error, and Puller won't work.
  • What did you expect or want to happen?

    • puller constructor error on invalid status should include more details
    • this deployment API ValidateAtSubscriptionScope is generated with async pattern, while the API is not async API, I managed to find internal contact telling me they are making 2023-07-01 templateValidation async API, and it's mixed between sync and async.
    • not sure if we are first one using armresource.DeploymentClient.Validate, but this scenario is really not use-able. how C#/Python client handling this scenario.
  • How can we reproduce it?

    • create a policy block a test tag
{
		"id": "/subscriptions/<your subscription>/providers/Microsoft.Authorization/policyAssignments/305bef06aca94425b03f5006",
		"name": "305bef06aca94425b03f5006",
		"properties": {
		  "displayName": "not allow tag owned-by-haitao (testing)",
		  "enforcementMode": "Default",
		  "parameters": {
			"effect": {
			  "value": "Deny"
			},
			"tagName": {
			  "value": "owned-by-haitao"
			}
		  },
		  "policyDefinitionId": "/providers/Microsoft.Authorization/policyDefinitions/36fd7371-8eb7-4321-9c30-a7100022d048",
		  "scope": "/subscriptions/<your subscription>"
		},
		"type": "Microsoft.Authorization/policyAssignments"
	}
  • validate a template that have this tag
{
		"$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
		"contentVersion": "1.0.0.0",
		"resources": [
		  {
			"apiVersion": "2018-05-01",
			"id": "/subscriptions/<your subscription>/providers/Microsoft.Resources/resourcegroups/testRG",
			"location": "eastus",
			"name": "testRG",
			"tags": {
			  "owned-by-haitao": "test"
			},
			"type": "Microsoft.Resources/resourceGroups"
		  },
		  {
			"apiVersion": "2021-03-01",
			"properties": {
			  "securityRules": []
			},
			"name": "test-nsg",
			"type": "Microsoft.Network/networkSecurityGroups",
			"location": "eastus",
			"tags": {
			  "owned-by-haitao": "test"
			}
		  }
		]
	  }
  • Anything we should know about your environment.

haitch avatar Aug 14 '23 07:08 haitch

OpenAPI for this operation https://github.com/Azure/azure-rest-api-specs/blob/main/specification/resources/resource-manager/Microsoft.Resources/stable/2021-04-01/resources.json#L1331

jhendrixMSFT avatar Aug 14 '23 23:08 jhendrixMSFT

so these 2 condition (longRunningOperation, and defining 400 with responses section) should not be combined together, we need to kick out one of them, I vote to kick out "x-ms-long-running-operation": true, at least for this APIVersion.

haitch avatar Aug 15 '23 00:08 haitch

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @armleads-azure.

github-actions[bot] avatar Aug 16 '23 09:08 github-actions[bot]

I suspect that removing x-ms-long-running-operation will break the common case. It would probably be better to remove the schema for the 400 response. That said, IMO what we need to understand here is the intent. I'm going to guess that modeling the 400 as a success is the short-circuit case when the RP doesn't need to initiate a LRO to perform the work. If this is the case, then I don't believe we have a way to model such LROs (it might just happen to work for LROs that are a RELO, but this operation is a POST). It might also go against the Azure design guidelines.

CC @JeffreyRichter

jhendrixMSFT avatar Aug 22 '23 14:08 jhendrixMSFT

Sidenote for @jhendrixMSFT : I see runtime.PollUntilDoneOptions is useful to customers and in the runtime package - we should consider moving it.

I found the REST API docs (certainly created from the swagger) for this here: https://learn.microsoft.com/en-us/rest/api/resources/deployments/validate-at-subscription-scope#deploymentproperties. The docs are so bad, I'd be surprised if anyone could use this correctly.

If this is supposed to be an LRO, then there are no docs describing how to poll for the status. For example, no mention of a response header with a URL to use for polling. The docs indicating that 400 returns a DeploymentValidateResult seems anti-HTTP and bad design. 400 means that the operation did not work at all.

We'd have to get @rkmanda and the service team involved to at least improve the docs and probably the swagger so it reflects how the operation actually works today. Then maybe, we fix how the operation works to make it follow HTTP (and Azure) guidelines so it works as people might expect it to.

JeffreyRichter avatar Aug 22 '23 15:08 JeffreyRichter

The docs indicating that 400 returns a DeploymentValidateResult seems anti-HTTP and bad design. 400 means that the operation did not work at all.

exactly, this should be a 200 response, with some response payload, that describe validation result.

haitch avatar Aug 24 '23 16:08 haitch

any update on this is highly appreciated

sushmak295 avatar Aug 24 '23 17:08 sushmak295

any update on this is highly appreciated

sushmak295 avatar Sep 21 '23 23:09 sushmak295

This affects us as well. It seems that all BeginValidate* methods have the same issue, at least we are facing it with BeginValidate

olegch avatar Mar 04 '24 21:03 olegch