terraform-provider-databricks icon indicating copy to clipboard operation
terraform-provider-databricks copied to clipboard

[ISSUE] Issue with `databricks_mws_workspaces` resource token rotation

Open mikekohmhe opened this issue 1 year ago • 3 comments

There is a similar, but not quite the same existing issue https://github.com/databricks/terraform-provider-databricks/issues/2750 . This issue does not use time_rotating.

Configuration

Believe our code is basically the same as shown at https://registry.terraform.io/providers/databricks/databricks/1.25.1/docs/resources/mws_workspaces

resource "databricks_mws_workspaces" "this" {
  provider       = databricks.mws
  account_id     = var.databricks_account_id
  workspace_name = var.prefix
  aws_region     = var.region

  credentials_id           = databricks_mws_credentials.this.credentials_id
  storage_configuration_id = databricks_mws_storage_configurations.this.storage_configuration_id
  network_id               = databricks_mws_networks.this.network_id

  token {}
}

output "databricks_token" {
  value     = databricks_mws_workspaces.this.token[0].token_value
  sensitive = true
}

I would guess it doesn't matter for the error, but in our particular code, we have lifetime_seconds set to 90 days (instead of unset default of 30 days).

Expected Behavior

Terraform apply succeeds.

Actual Behavior

Error that appeared during cronned Terraform apply at 2024-1-9, at 6:00-ish PM, 8:41-ish PM, 9:12-ish PM EST were variants of

Error: cannot read xxx: Invalid access token.`

with xxx,

A subsequent cronned Terraform apply succeded with no intervention.

I don't know if it particularly matters, but I see 2 tokens with creation times around the problem time.

databricks --profile xx token-management list --created-by-username xx --output json | jq 'map((.creation_time, .expiry_time) |= (. / 1000 | strftime("%Y-%m-%d %H:%M:%S")))'
[
  {
...
    "creation_time": "2024-01-10 11:02:21",
    "expiry_time": "2024-04-09 11:02:21",
...
  },
  {
...
    "creation_time": "2024-01-10 01:42:32",
    "expiry_time": "2024-04-09 01:42:32",
...
  }
]

Databricks support says if the issue happens again, to run with export TF_LOG="DEBUG", delete the tokens, and apply again.

Steps to Reproduce

  1. terraform apply

Terraform and provider versions

terraform version 1.5.7 provider 1.25.1

Is it a regression?

Didn't try any other versions. No notes that I can see of any changes made to databricks_mws_workspaces in provider versions up to 1.36.3.

Debug Output

Important Factoids

Would you like to implement a fix?

mikekohmhe avatar Feb 14 '24 17:02 mikekohmhe

What are you using the token for? Are you using the token for anything other than to configure the provider to talk to this new workspace? Reason I ask is that I'm working on a mechanism to allow you to use the account-level provider to manage workspace-level resources. This will eliminate the need to get a token from this resource, and it should eliminate a class of issues that arises when rotating the token created by this resource.

mgyucht avatar Feb 14 '24 19:02 mgyucht

See #3188.

mgyucht avatar Feb 14 '24 19:02 mgyucht

Hmm, the token is used for just about everything to "manage" the workspace, which the doc suggests be done.

Code that creates workspaces and code that manages workspaces must be in separate terraform modules to avoid common confusion between provider = databricks.mws and provider = databricks.created_workspace. This is why we specify databricks_host and databricks_token outputs, that have to be used in the latter modules:

mikekohmhe avatar Feb 15 '24 19:02 mikekohmhe

Had another similar terraform failure in a different SDLC. However, in this SDLC, when I checked for tokens with token-management list, there were none. Reapplying addressed the issue. So ...

In the original failure, somehow 2 tokens were created in the midst of 3 apply failures. This latest failure is more straightforward and the existing token simply expired.

mikekohmhe avatar Feb 20 '24 22:02 mikekohmhe