go-azure-sdk icon indicating copy to clipboard operation
go-azure-sdk copied to clipboard

`resourcemanager/pollers`: updating LRO Pollers to also check the `provisioningState` / resource existence

Open tombuildsstuff opened this issue 2 years ago • 1 comments

Community Note

  • Please vote on this issue by adding a :thumbsup: reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Description

There’s a number of Azure APIs which are LROs but mark themselves as successfully provisioned (e.g. the LRO is completed) before the provisioningState is fully updated.

Some example of this can be found in:

  • https://github.com/hashicorp/terraform-provider-azurerm/pull/13525
  • https://github.com/hashicorp/terraform-provider-azurerm/pull/21290
  • https://github.com/hashicorp/terraform-provider-azurerm/pull/23721

This is true both in the Create/Update cases (where the Resource isn’t fully usable) and during Deletion (where the Resource continues to kick around despite being deleted) - whilst these are both API bugs - this is something we should be able to workaround by polling.

This is exacerbated when a resource uses a Private Endpoint, since these take longer to provision/can kick around (which this PR looks to fix) - however this attempts to fix this on individual resources.

As such there’s a couple of scenarios to be concerned about here:

  1. When creating/updating a Resource with an LRO we need to first poll the LRO until (successful) completion and then invoke a provisioningState poller.
  2. When deleting a Resource with an LRO we need to first poll the LRO until (successful) completion and then invoke a secondary “get” poller (existence?), to do a GET on the Resource until it’s consistently returning a 404.

Proposal

We should introduce a new “combined”/unified poller type (needs a better name) for Resource Manager handling both Create/Update and Deletions.

In the event that the LRO fails, we shouldn’t attempt to proceed to the provisioningState poller - but instead surface this error.

In the event that the Resource doesn’t have a provisioningState, we should skip this and return to remain compatible?

Pseudo-code (for the Create/Update scenario):


func (c *combinedPoller) Poll(ctx context.Context) (*pollers.PollResult, error) {
	// First poll on the Long Running Operation through to completion
	lro, err := longRunningOperationPollerFromResponse(c.initialResponse, c.client.Client)
	if err != nil {
		// ...
	}
	lroResult, err := lro.Poll(ctx)
	if err != nil {
		// ...
	}
	if lroResult.Status != pollers.PollingStatusSucceeded {
		return lroResult, err
	}

	// Now that we've polled on the LRO, let's check the `provisioningState` is `Succeeded`
	pollingInterval := lroResult.PollInterval
	provisioningState, err := provisioningStatePollerFromResponse(c.initialResponse, c.lroIsSelfReference, c.client, pollingInterval)
	if err != nil {
		// ...
	}
	provisioningResult, err := provisioningState.Poll(ctx)
	if err != nil {
		// ...
	}
	return provisioningResult, err
}

Two separate pollers would be needed - one for Create/Update (PUT/POST/PATCH LROs) and Delete (DELETE LROs) - but this should be possible to configure this in `resourcemanager.PollerFromResponse to have this hooked up across the SDK.

Notes

Doing this would introduce an extra ARM call (minimum) per resource, but this would increase the reliability which feels like a worthwhile tradeoff.

tombuildsstuff avatar Feb 15 '23 10:02 tombuildsstuff

@tombuildsstuff , specifically for my service (Azure NetApp Files, this also applies to resource creation while it responds as created while it is still in progress, so this case will need to be taken care of as well.

paulomarquesc avatar Apr 17 '23 17:04 paulomarquesc