terraform-provider-tailscale
terraform-provider-tailscale copied to clipboard
The `tailscale_tailnet_key` resource should handle expired keys
Describe the bug
A few months after generating a few tailscale_tailnet_key
resources in my terraform code, those keys expired in tailscale control and were subsequently deleted by the api. Now when I run my Terraform again after the keys have expired, the provider fails to fetch these keys and blocks my Terraform deployment.
To Reproduce Steps to reproduce the behaviour:
- Terraform apply a
tailscale_tailnet_key
withreusable = true
- Wait for the key to expire (or revoke it so you don't have to wait around)
- Terraform plan/apply using same configuration
- Fail to fetch key with 404
Error: Failed to fetch key
with tailscale_tailnet_key.reusable,
on my_keys.tf line 1, in resource "tailscale_tailnet_key" "reusable":
1: resource "tailscale_tailnet_key" "reusable" {
404 page not found (404)
Expected behaviour The provider detects an expired key by evaluating that the expire time/date in state is less than the current time/date and that the resource is a 404 response from the api, removes the resource from state, then a plan is calculated to recreate the resource to ensure the desired end state.
Desktop (please complete the following information):
- OS: linux
- Terraform Version 1.1.9
- Provider Version 0.12.2
Additional context
This is similar to an issue brought up in #59 where keys would vanish from the API, except due to reusable = false
Agreed that this should be addressed. It is a bit tricky in that the API key which issues the auth key also has a 90 day expiration and may also be expired.
Work on https://github.com/tailscale/tailscale/issues/3243 is underway now, to provide an automatable workflow where API keys won't expire and require manual intervention. We'll update this provider to use that mechanism when it is available.
Currently, the provider handles a 404 returned from the API when the key is non-reusable here:
https://github.com/tailscale/terraform-provider-tailscale/blob/main/tailscale/resource_tailnet_key.go#L108
The fact that reusable keys expire naturally does cause an issue, but I think the best way to work with this is to use the lifecycle
meta argument with a condition on a time_rotation
resource from the hashicorp/time
provider:
resource "time_rotating" "tskey" {
rotation_days = 30
}
resource "tailscale_tailnet_key" "tskey" {
reusable = true
preauthorized = true
lifecycle {
replace_triggered_by = [time_rotating.tskey]
}
}
The above will ensure that the first apply after 30 days will recreate the key.
Alternatively, if you're using HashiCorp Vault as part of your stack also, there are a couple vault plugins for issuing tailnet keys:
https://github.com/bloominlabs/vault-plugin-secrets-tailscale https://github.com/davidsbond/vault-plugin-tailscale
FYI @davidsbond, there's currently a bug (https://github.com/hashicorp/terraform-provider-time/issues/118) in the time_rotation
resource that means this this doesn't work-as is.
There's an additonal suggested workaround https://github.com/hashicorp/terraform-provider-time/issues/118#issuecomment-1316056478
I ran into a related issue. If you set reusable = false
, the provider still tries to reuse the same key on the next run, even though it is revoked and thus cannot be used more than once (hence reusable = false
).
@DentonGentry @!davidsbond this seems related to https://github.com/tailscale/terraform-provider-tailscale/issues/198, not sure which ya'll want to keep open.
Now that https://github.com/tailscale/tailscale/issues/3243 is implemented and closed, any chance to move this major issue forward? Right now my team is dealing with via a recurring calendar event that attempts to remind us to rotate the key... definitely not my favorite thing 😁
[edited Mar 1, 2023 by DentonGentry to remove the @-mention of davidsbond, who developed this provider but has handed off ongoing support for Tailscale to handle]
I have created a PR #221 that adds a new field, where you can add a time_rotation
resource which forces a re-creating of the key. Inspired by the azuread_application_password
resource
Currently, the provider handles a 404 returned from the API when the key is non-reusable here:
https://github.com/tailscale/terraform-provider-tailscale/blob/main/tailscale/resource_tailnet_key.go#L108
The fact that reusable keys expire naturally does cause an issue, but I think the best way to work with this is to use the
lifecycle
meta argument with a condition on atime_rotation
resource from thehashicorp/time
provider:resource "time_rotating" "tskey" { rotation_days = 30 } resource "tailscale_tailnet_key" "tskey" { reusable = true preauthorized = true lifecycle { replace_triggered_by = [time_rotating.tskey] } }
The above will ensure that the first apply after 30 days will recreate the key.
This doesn't seem to work with what's being suggested, even when the key is set to never expire.
As the time_rotating
with lifecycle.replace_triggered_by
workaround isn't effective, I've automated this using the terraform -replace
argument. For example:
terraform plan -replace tailscale_tailnet_key.tskey -out planfile
@Gowiem - this might avoid the manual intervention for you.
-replace
didn't work for me either. It gives same 404. Any idea why?
It's because even replacement requires Terraform to try and refresh the state of the existing resource, before deciding what action to take. My understanding is that the provider should be "handling" the 404, so to speak -- it should treat it as a resource that has been deleted remotely. (I asked about this issue on /r/terraform, and the conclusion was that this was a bug, rather than a missing feature)
The code currently checks case tailscale.IsNotFound(err) && !reusable
-- I wonder if we can just remove the reusable
check. If fetching the key 404s -- shouldn't Terraform just treat it as a deleted resource regardless?
We've discovered in #306 that this change had a side effect of making Terraform recreate single-use keys, which may result in other resources that use the key being updated/replaced. I am thinking of reverting the change to go back to the historic behaviour (ignore invalid keys), adding a new recreate_if_invalid
attribute that will make Terraform recreate the key for cases when that is needed. If you have any feedback on this idea, or would like to propose an alternative, please comment in #306.