terraform-provider-tailscale icon indicating copy to clipboard operation
terraform-provider-tailscale copied to clipboard

The `tailscale_tailnet_key` resource should handle expired keys

Open jtcressy opened this issue 2 years ago • 3 comments

Describe the bug A few months after generating a few tailscale_tailnet_key resources in my terraform code, those keys expired in tailscale control and were subsequently deleted by the api. Now when I run my Terraform again after the keys have expired, the provider fails to fetch these keys and blocks my Terraform deployment.

To Reproduce Steps to reproduce the behaviour:

  1. Terraform apply a tailscale_tailnet_key with reusable = true
  2. Wait for the key to expire (or revoke it so you don't have to wait around)
  3. Terraform plan/apply using same configuration
  4. Fail to fetch key with 404
Error: Failed to fetch key

  with tailscale_tailnet_key.reusable,
  on my_keys.tf line 1, in resource "tailscale_tailnet_key" "reusable":
   1: resource "tailscale_tailnet_key" "reusable" {

404 page not found (404)

Expected behaviour The provider detects an expired key by evaluating that the expire time/date in state is less than the current time/date and that the resource is a 404 response from the api, removes the resource from state, then a plan is calculated to recreate the resource to ensure the desired end state.

Desktop (please complete the following information):

  • OS: linux
  • Terraform Version 1.1.9
  • Provider Version 0.12.2

Additional context This is similar to an issue brought up in #59 where keys would vanish from the API, except due to reusable = false

jtcressy avatar Aug 29 '22 20:08 jtcressy

Agreed that this should be addressed. It is a bit tricky in that the API key which issues the auth key also has a 90 day expiration and may also be expired.

Work on https://github.com/tailscale/tailscale/issues/3243 is underway now, to provide an automatable workflow where API keys won't expire and require manual intervention. We'll update this provider to use that mechanism when it is available.

DentonGentry avatar Aug 29 '22 21:08 DentonGentry

Currently, the provider handles a 404 returned from the API when the key is non-reusable here:

https://github.com/tailscale/terraform-provider-tailscale/blob/main/tailscale/resource_tailnet_key.go#L108

The fact that reusable keys expire naturally does cause an issue, but I think the best way to work with this is to use the lifecycle meta argument with a condition on a time_rotation resource from the hashicorp/time provider:

resource "time_rotating" "tskey" {
  rotation_days = 30
}

resource "tailscale_tailnet_key" "tskey" {
  reusable      = true
  preauthorized = true

  lifecycle {
    replace_triggered_by = [time_rotating.tskey]
  }
}

The above will ensure that the first apply after 30 days will recreate the key.

davidsbond avatar Aug 30 '22 11:08 davidsbond

Alternatively, if you're using HashiCorp Vault as part of your stack also, there are a couple vault plugins for issuing tailnet keys:

https://github.com/bloominlabs/vault-plugin-secrets-tailscale https://github.com/davidsbond/vault-plugin-tailscale

davidsbond avatar Aug 30 '22 12:08 davidsbond

FYI @davidsbond, there's currently a bug (https://github.com/hashicorp/terraform-provider-time/issues/118) in the time_rotation resource that means this this doesn't work-as is.

There's an additonal suggested workaround https://github.com/hashicorp/terraform-provider-time/issues/118#issuecomment-1316056478

milesarmstrong avatar Nov 29 '22 11:11 milesarmstrong

I ran into a related issue. If you set reusable = false, the provider still tries to reuse the same key on the next run, even though it is revoked and thus cannot be used more than once (hence reusable = false).

sspencer-hubble avatar Dec 12 '22 18:12 sspencer-hubble

@DentonGentry @!davidsbond this seems related to https://github.com/tailscale/terraform-provider-tailscale/issues/198, not sure which ya'll want to keep open.

Now that https://github.com/tailscale/tailscale/issues/3243 is implemented and closed, any chance to move this major issue forward? Right now my team is dealing with via a recurring calendar event that attempts to remind us to rotate the key... definitely not my favorite thing 😁

[edited Mar 1, 2023 by DentonGentry to remove the @-mention of davidsbond, who developed this provider but has handed off ongoing support for Tailscale to handle]

Gowiem avatar Mar 01 '23 16:03 Gowiem

I have created a PR #221 that adds a new field, where you can add a time_rotation resource which forces a re-creating of the key. Inspired by the azuread_application_password resource

omBratteng avatar Mar 23 '23 23:03 omBratteng

Currently, the provider handles a 404 returned from the API when the key is non-reusable here:

https://github.com/tailscale/terraform-provider-tailscale/blob/main/tailscale/resource_tailnet_key.go#L108

The fact that reusable keys expire naturally does cause an issue, but I think the best way to work with this is to use the lifecycle meta argument with a condition on a time_rotation resource from the hashicorp/time provider:

resource "time_rotating" "tskey" {
  rotation_days = 30
}

resource "tailscale_tailnet_key" "tskey" {
  reusable      = true
  preauthorized = true

  lifecycle {
    replace_triggered_by = [time_rotating.tskey]
  }
}

The above will ensure that the first apply after 30 days will recreate the key.

This doesn't seem to work with what's being suggested, even when the key is set to never expire.

tath81 avatar May 16 '23 17:05 tath81

As the time_rotating with lifecycle.replace_triggered_by workaround isn't effective, I've automated this using the terraform -replace argument. For example:

terraform plan -replace tailscale_tailnet_key.tskey -out planfile

@Gowiem - this might avoid the manual intervention for you.

jinnko avatar May 28 '23 20:05 jinnko

-replace didn't work for me either. It gives same 404. Any idea why?

kphatak avatar Jul 12 '23 17:07 kphatak

It's because even replacement requires Terraform to try and refresh the state of the existing resource, before deciding what action to take. My understanding is that the provider should be "handling" the 404, so to speak -- it should treat it as a resource that has been deleted remotely. (I asked about this issue on /r/terraform, and the conclusion was that this was a bug, rather than a missing feature)

The code currently checks case tailscale.IsNotFound(err) && !reusable -- I wonder if we can just remove the reusable check. If fetching the key 404s -- shouldn't Terraform just treat it as a deleted resource regardless?

gautamg795 avatar Jul 22 '23 09:07 gautamg795

We've discovered in #306 that this change had a side effect of making Terraform recreate single-use keys, which may result in other resources that use the key being updated/replaced. I am thinking of reverting the change to go back to the historic behaviour (ignore invalid keys), adding a new recreate_if_invalid attribute that will make Terraform recreate the key for cases when that is needed. If you have any feedback on this idea, or would like to propose an alternative, please comment in #306.

knyar avatar Nov 28 '23 11:11 knyar