terraform-provider-tfe
terraform-provider-tfe copied to clipboard
Delay premature workspace run before dependent resources are available e.g. workspace variables
When creating a workspace in Terraform Cloud, once in a while, a run will kick off before the workspace's variables and other resources such as notification configuration have been successfully added, causing the run to fail. Which is essentially a race condition. Manually starting a run will then work since all the configuration would be in place by then.
It would be beneficial if some logic which could check if dependent resources such as workspace variables are already created or some sort of delay attribute could be implemented.
Thanks in advance!
Hello!
I assume you are using the VCS-driven run workflow and a commit is triggering a run before your required settings (e.g. variables) are created, based on your description.
This is what the queue_all_runs
argument on tfe_workspace
is for:
queue_all_runs - (Optional) Whether the workspace should start automatically performing runs immediately after its creation. When set to false, runs triggered by a webhook (such as a commit in VCS) will not be queued until at least one run has been manually queued. Defaults to true. Note: This default differs from the Terraform Cloud API default, which is false. The provider uses true as any workspace provisioned with false would need to then have a run manually queued out-of-band before accepting webhooks.
Using this, you can then trigger a run (either yourself in the UI or through scripting/automation) to have the workspace begin accepting webhooks from VCS, if you don't have control of the VCS repository yourself and can't guarantee someone won't commit something to your configuration as you're creating workspaces that will provision it. (If you do, I would restrict this - you're right, it's an out-of-band race condition that can't really be handled by the provider other than ignoring webhooks through this option until you say all is ready!)
Does that help?
Thanks for the reply @chrisarcand. I considered doing a manual trigger previously but wanted to get around having to implement additional logic. I guess I could have a null resource with a local-exec provisioner to kickstart the build.
@chac-dee how did you end up solving this issue? we've been seeing the initial run created due to queue_all_runs
fail sometimes because of the variables not being set before the run was kicked off.
I assume you are using the VCS-driven run workflow and a commit is triggering a run before your required settings (e.g. variables) are created, based on your description.
@chrisarcand we are having the same issue, but this assumption isn't quite right. The run that's failing is not due to commits, but rather due to a race condition with setting variables on the workspace through this provider, and queue_all_runs
being set to true.
This simple configuration in the demo could potentially cause the issue if the configuration requires that variable:
resource "tfe_workspace" "test" {
name = "my-workspace-name"
organization = tfe_organization.test.id
}
resource "tfe_variable" "test" {
key = "my_key_name"
value = "my_value_name"
category = "terraform"
workspace_id = tfe_workspace.test.id
description = "a useful description"
}
Could there be a way to set the workspace variables at the same time as creating the workspace somehow, so that the auto-enqueued run happens with these already being set? We wouldn't want to lose triggering this initial run through terraform if possible.
@pedroslopez So we have a manually created Terraform Cloud workspace which acts as the "parent" workspace to generate all the other workspaces. This is the only manually created workspace, which essentially acts as the bootstrap. We have set up notifications on this parent workspace, which upon successful resource creation (workspace + variables etc), it sends a notification to an endpoint we have which processes the JSON response. Once validating this notification, it then uses a TFE TOKEN to create a run on the recently created workspace, using the Runs API. This ensures a run is only triggered when we are sure all the resources are already created.
I have the same issue.
I do use a VCS driven workspace and I ensure all the files are in the repo before the workspace is created. So the repo has valid TF config files inside. It just occasionally fails when if the workspace vars are not added quickly enough.
This does seem to be reduced by using variable sets for the majority of vars but not eliminated completely and not a proper solution.
I too, do not want to set queue_all_runs
to false and have to use some other method to trigger runs.
What would probably be a nice elegant solution would be something like a tfe_workspace_run
resource, which you could then set a depends_on
for all the variables etc and then I would be happy to set queue_all_runs
to false.
This would also probably mean that you could do some other fun things like having runs triggered based on time_rotating
or some other triggers so that the workspaces can have runs based on something other than a VCS change.
At present I just have these workspaces triggered by changing the files in them and the VCS webhook then runs the workspace, but it would be nice to no have to make commits to have that happen (I do trigger runs with API calls in some situations though but avoiding it for these specific ones at the moment).
I think the proposed solution is related to this issue: https://github.com/hashicorp/terraform-provider-tfe/issues/534
@rhughes1 yep, it's been a while but we solved this by using https://github.com/mitchellh/terraform-provider-multispace 's multispace_run
resource, setting dependencies on the variables and queue_all_runs=false
on the workspace resource.
resource "multispace_run" "run" {
for_each = tfe_workspace.app
organization = var.tfe_organization_name
workspace = each.value.name
# wait for all vars to be set before triggering run
depends_on = [
tfe_variable.base_workspace_name,
tfe_variable.environment,
tfe_variable.aws_region,
tfe_variable.aws_access_key_id,
tfe_variable.aws_secret_access_key,
tfe_variable.uptimerobot_api_key,
tfe_variable.sentry_token,
]
}
Would be great to have this built in to the TFE provider.
I think you'll agree with me me that multispace_run with queue_all_runs=false is a good solution to this issue and will close this as a duplicate of feature request #742