go-azure-sdk
go-azure-sdk copied to clipboard
`resourcemanager/pollers`: updating LRO Pollers to also check the `provisioningState` / resource existence
Community Note
- Please vote on this issue by adding a :thumbsup: reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Description
There’s a number of Azure APIs which are LROs but mark themselves as successfully provisioned (e.g. the LRO is completed) before the provisioningState
is fully updated.
Some example of this can be found in:
- https://github.com/hashicorp/terraform-provider-azurerm/pull/13525
- https://github.com/hashicorp/terraform-provider-azurerm/pull/21290
- https://github.com/hashicorp/terraform-provider-azurerm/pull/23721
This is true both in the Create/Update cases (where the Resource isn’t fully usable) and during Deletion (where the Resource continues to kick around despite being deleted) - whilst these are both API bugs - this is something we should be able to workaround by polling.
This is exacerbated when a resource uses a Private Endpoint, since these take longer to provision/can kick around (which this PR looks to fix) - however this attempts to fix this on individual resources.
As such there’s a couple of scenarios to be concerned about here:
- When creating/updating a Resource with an LRO we need to first poll the LRO until (successful) completion and then invoke a
provisioningState
poller. - When deleting a Resource with an LRO we need to first poll the LRO until (successful) completion and then invoke a secondary “get” poller (existence?), to do a GET on the Resource until it’s consistently returning a 404.
Proposal
We should introduce a new “combined”/unified poller type (needs a better name) for Resource Manager handling both Create/Update and Deletions.
In the event that the LRO fails, we shouldn’t attempt to proceed to the provisioningState
poller - but instead surface this error.
In the event that the Resource doesn’t have a provisioningState
, we should skip this and return to remain compatible?
Pseudo-code (for the Create/Update scenario):
func (c *combinedPoller) Poll(ctx context.Context) (*pollers.PollResult, error) {
// First poll on the Long Running Operation through to completion
lro, err := longRunningOperationPollerFromResponse(c.initialResponse, c.client.Client)
if err != nil {
// ...
}
lroResult, err := lro.Poll(ctx)
if err != nil {
// ...
}
if lroResult.Status != pollers.PollingStatusSucceeded {
return lroResult, err
}
// Now that we've polled on the LRO, let's check the `provisioningState` is `Succeeded`
pollingInterval := lroResult.PollInterval
provisioningState, err := provisioningStatePollerFromResponse(c.initialResponse, c.lroIsSelfReference, c.client, pollingInterval)
if err != nil {
// ...
}
provisioningResult, err := provisioningState.Poll(ctx)
if err != nil {
// ...
}
return provisioningResult, err
}
Two separate pollers would be needed - one for Create/Update (PUT/POST/PATCH LROs) and Delete (DELETE LROs) - but this should be possible to configure this in `resourcemanager.PollerFromResponse to have this hooked up across the SDK.
Notes
Doing this would introduce an extra ARM call (minimum) per resource, but this would increase the reliability which feels like a worthwhile tradeoff.
@tombuildsstuff , specifically for my service (Azure NetApp Files, this also applies to resource creation while it responds as created while it is still in progress, so this case will need to be taken care of as well.