terraform-provider-databricks icon indicating copy to clipboard operation
terraform-provider-databricks copied to clipboard

[DRAFT] Retry storage credential create/update to allow for IAM propagation

Open neinkeinkaffee opened this issue 3 years ago • 5 comments
trafficstars

Retry the create/update operation for storage credentials if the error indicates that we may have to allow more time for IAM propagation. The timeout of 2 minutes could be further decreased, the current number is taken from corresponding timeouts for the IAM propagation case in the AWS Terraform provider.

neinkeinkaffee avatar Jul 12 '22 21:07 neinkeinkaffee

Codecov Report

Merging #1454 (f08833a) into master (73d5b17) will decrease coverage by 0.40%. The diff coverage is 98.07%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1454      +/-   ##
==========================================
- Coverage   90.12%   89.71%   -0.41%     
==========================================
  Files         126      129       +3     
  Lines       10218    10446     +228     
==========================================
+ Hits         9209     9372     +163     
- Misses        642      695      +53     
- Partials      367      379      +12     
Impacted Files Coverage Δ
common/http.go 98.01% <ø> (-0.01%) :arrow_down:
mws/resource_mws_credentials.go 86.95% <96.29%> (+2.95%) :arrow_up:
catalog/resource_storage_credential.go 100.00% <100.00%> (ø)
exporter/importables.go 85.31% <0.00%> (-5.92%) :arrow_down:
exporter/util.go 79.55% <0.00%> (-1.13%) :arrow_down:
pipelines/resource_pipeline.go 92.17% <0.00%> (-0.97%) :arrow_down:
sql/resource_sql_widget.go 87.50% <0.00%> (-0.70%) :arrow_down:
sql/resource_sql_endpoint.go 97.43% <0.00%> (-0.13%) :arrow_down:
mws/resource_mws_workspaces.go 90.15% <0.00%> (-0.12%) :arrow_down:
... and 14 more

codecov-commenter avatar Jul 12 '22 21:07 codecov-commenter

I have moved the retryOnError method into http.go in the common package in order to reuse it for the MWS credential. I'm not sure if that's the best location. I'm also wondering whether it might be better to keep the value of the timeout duration for IAM propagation separate from the resource timeouts for creation and update, given that one might not want to retry on IAM error until the overall resource creation/update timeout limit is reached. It would have the additional benefit of not having to pass down the resource timeout.

neinkeinkaffee avatar Jul 23 '22 18:07 neinkeinkaffee

@neinkeinkaffee I’m not sure if common package is the right place for it. Cost of refactoring/supporting it in the common package is higher than copy paste of approximately 10 lines. So far i see it used only two times (for IAM roles). This is really retry-and-wait for error that might be a valid error after one or two minutes. Common package does another kind of error retries, which are immediate. More than that, common package is being refactored into standalone SDK as we speak, so I don’t want more things added over there.

bottom line: these retries are great, but I think they belong to mws and catalog packages.

nfx avatar Jul 23 '22 19:07 nfx

Makes sense! I've moved the retry code back into the catalog and mws.

neinkeinkaffee avatar Jul 23 '22 20:07 neinkeinkaffee

any updates on this?

oleksandr-gubchenko avatar Apr 19 '23 14:04 oleksandr-gubchenko