terraform-provider-tfe icon indicating copy to clipboard operation
terraform-provider-tfe copied to clipboard

Delay premature workspace run before dependent resources are available e.g. workspace variables

Open chac-dee opened this issue 3 years ago • 4 comments

When creating a workspace in Terraform Cloud, once in a while, a run will kick off before the workspace's variables and other resources such as notification configuration have been successfully added, causing the run to fail. Which is essentially a race condition. Manually starting a run will then work since all the configuration would be in place by then.

It would be beneficial if some logic which could check if dependent resources such as workspace variables are already created or some sort of delay attribute could be implemented.

Thanks in advance!

chac-dee avatar Aug 11 '21 12:08 chac-dee

Hello!

I assume you are using the VCS-driven run workflow and a commit is triggering a run before your required settings (e.g. variables) are created, based on your description.

This is what the queue_all_runs argument on tfe_workspace is for:

queue_all_runs - (Optional) Whether the workspace should start automatically performing runs immediately after its creation. When set to false, runs triggered by a webhook (such as a commit in VCS) will not be queued until at least one run has been manually queued. Defaults to true. Note: This default differs from the Terraform Cloud API default, which is false. The provider uses true as any workspace provisioned with false would need to then have a run manually queued out-of-band before accepting webhooks.

Using this, you can then trigger a run (either yourself in the UI or through scripting/automation) to have the workspace begin accepting webhooks from VCS, if you don't have control of the VCS repository yourself and can't guarantee someone won't commit something to your configuration as you're creating workspaces that will provision it. (If you do, I would restrict this - you're right, it's an out-of-band race condition that can't really be handled by the provider other than ignoring webhooks through this option until you say all is ready!)

Does that help?

chrisarcand avatar Aug 11 '21 14:08 chrisarcand

Thanks for the reply @chrisarcand. I considered doing a manual trigger previously but wanted to get around having to implement additional logic. I guess I could have a null resource with a local-exec provisioner to kickstart the build.

chac-dee avatar Aug 11 '21 23:08 chac-dee

@chac-dee how did you end up solving this issue? we've been seeing the initial run created due to queue_all_runs fail sometimes because of the variables not being set before the run was kicked off.

I assume you are using the VCS-driven run workflow and a commit is triggering a run before your required settings (e.g. variables) are created, based on your description.

@chrisarcand we are having the same issue, but this assumption isn't quite right. The run that's failing is not due to commits, but rather due to a race condition with setting variables on the workspace through this provider, and queue_all_runs being set to true.

This simple configuration in the demo could potentially cause the issue if the configuration requires that variable:

resource "tfe_workspace" "test" {
  name         = "my-workspace-name"
  organization = tfe_organization.test.id
}

resource "tfe_variable" "test" {
  key          = "my_key_name"
  value        = "my_value_name"
  category     = "terraform"
  workspace_id = tfe_workspace.test.id
  description  = "a useful description"
}

Could there be a way to set the workspace variables at the same time as creating the workspace somehow, so that the auto-enqueued run happens with these already being set? We wouldn't want to lose triggering this initial run through terraform if possible.

pedroslopez avatar Oct 19 '21 19:10 pedroslopez

@pedroslopez So we have a manually created Terraform Cloud workspace which acts as the "parent" workspace to generate all the other workspaces. This is the only manually created workspace, which essentially acts as the bootstrap. We have set up notifications on this parent workspace, which upon successful resource creation (workspace + variables etc), it sends a notification to an endpoint we have which processes the JSON response. Once validating this notification, it then uses a TFE TOKEN to create a run on the recently created workspace, using the Runs API. This ensures a run is only triggered when we are sure all the resources are already created.

chac-dee avatar Oct 20 '21 09:10 chac-dee

I have the same issue.

I do use a VCS driven workspace and I ensure all the files are in the repo before the workspace is created. So the repo has valid TF config files inside. It just occasionally fails when if the workspace vars are not added quickly enough.

This does seem to be reduced by using variable sets for the majority of vars but not eliminated completely and not a proper solution.

I too, do not want to set queue_all_runs to false and have to use some other method to trigger runs.

What would probably be a nice elegant solution would be something like a tfe_workspace_run resource, which you could then set a depends_on for all the variables etc and then I would be happy to set queue_all_runs to false.

This would also probably mean that you could do some other fun things like having runs triggered based on time_rotating or some other triggers so that the workspaces can have runs based on something other than a VCS change. At present I just have these workspaces triggered by changing the files in them and the VCS webhook then runs the workspace, but it would be nice to no have to make commits to have that happen (I do trigger runs with API calls in some situations though but avoiding it for these specific ones at the moment).

paul-hugill avatar Nov 05 '22 07:11 paul-hugill

I think the proposed solution is related to this issue: https://github.com/hashicorp/terraform-provider-tfe/issues/534

rhughes1 avatar Dec 19 '22 19:12 rhughes1

@rhughes1 yep, it's been a while but we solved this by using https://github.com/mitchellh/terraform-provider-multispace 's multispace_run resource, setting dependencies on the variables and queue_all_runs=false on the workspace resource.

resource "multispace_run" "run" {
  for_each     = tfe_workspace.app
  organization = var.tfe_organization_name
  workspace    = each.value.name

  # wait for all vars to be set before triggering run
  depends_on = [
    tfe_variable.base_workspace_name,
    tfe_variable.environment,
    tfe_variable.aws_region,
    tfe_variable.aws_access_key_id,
    tfe_variable.aws_secret_access_key,
    tfe_variable.uptimerobot_api_key,
    tfe_variable.sentry_token,
  ]
}

Would be great to have this built in to the TFE provider.

pedroslopez avatar Dec 19 '22 20:12 pedroslopez

I think you'll agree with me me that multispace_run with queue_all_runs=false is a good solution to this issue and will close this as a duplicate of feature request #742

brandonc avatar Dec 23 '22 15:12 brandonc