terraform icon indicating copy to clipboard operation
terraform copied to clipboard

Circular dependency when using managed resources in a provider config

Open fnordian opened this issue 3 years ago • 15 comments
trafficstars

Creating a kubernetes cluster and using that resource to set up a kubernetes provider and create a pod seems to create a circular dependency.

Initial apply works, but when a change forces replacement of the pod, forced destroy fails.

Terraform Version

Terraform v1.1.4
on linux_amd64
+ provider registry.terraform.io/hashicorp/azuread v2.16.0
+ provider registry.terraform.io/hashicorp/azurerm v2.91.0
+ provider registry.terraform.io/hashicorp/kubernetes v2.7.1
+ provider registry.terraform.io/hashicorp/random v3.1.0

and

Terraform v1.1.5
on linux_amd64
+ provider registry.terraform.io/hashicorp/azuread v2.16.0
+ provider registry.terraform.io/hashicorp/azurerm v2.91.0
+ provider registry.terraform.io/hashicorp/kubernetes v2.7.1
+ provider registry.terraform.io/hashicorp/random v3.1.0

Terraform Configuration Files


resource "azurerm_kubernetes_cluster" "services" {
  name                = "service-aks1"
  location            = data.azurerm_resource_group.resource_group.location
  resource_group_name = data.azurerm_resource_group.resource_group.name
  dns_prefix          = "aks1"

  default_node_pool {
    name       = "default"
    node_count = 1
    vm_size    = "Standard_D2_v2"
  }

  identity {
    type = "SystemAssigned"
    user_assigned_identity_id = azurerm_user_assigned_identity.aks_identity.id
  }

  tags = {
    Environment = var.environment_name
  }

}
provider "kubernetes" {
  host                   = azurerm_kubernetes_cluster.services.kube_config.0.host
  username               = azurerm_kubernetes_cluster.services.kube_config.0.username
  password               = azurerm_kubernetes_cluster.services.kube_config.0.password
  client_certificate     = base64decode(azurerm_kubernetes_cluster.services.kube_config.0.client_certificate)
  client_key             = base64decode(azurerm_kubernetes_cluster.services.kube_config.0.client_key)
  cluster_ca_certificate = base64decode(azurerm_kubernetes_cluster.services.kube_config.0.cluster_ca_certificate)

}


resource "kubernetes_pod" "test" {
  metadata {
    name = "terraform-example2"
  }

  spec {
    container {
      image = "nginx:1.7.9"
      name  = "example"
    }
  }
}

...

Expected Behavior

pod should be updated/replaced

Actual Behavior

│ Error: Cycle: module.infrastructure.module.kubernetes.kubernetes_pod.test (destroy), module.infrastructure.module.kubernetes.azurerm_kubernetes_cluster.services, module.infrastructure.module.kubernetes.provider["registry.terraform.io/hashicorp/kubernetes"]

Steps to Reproduce

  1. terraform init
  2. terraform apply
  3. some changes to the pod that force replacement
  4. terraform apply

fnordian avatar Feb 03 '22 11:02 fnordian

Hi @fnordian,

Thanks for filing the issue. The configuration here isn't showing any relationship from the azurerm_kubernetes_cluster to the kubernetes provider. The cycle reported may have been reduced to the minimum size, but could still be due to resources not listed from portions of the configuration which have been left out. If you are unsure where the cycle is arising, can you supply a more complete example of the configuration?

Thanks!

jbardin avatar Feb 03 '22 13:02 jbardin

The relationship comes through the provider's parameters, doesn't it?

e.g.

host = azurerm_kubernetes_cluster.services.kube_config.0.host

fnordian avatar Feb 03 '22 15:02 fnordian

The dependency in that direction should not result in a cycle on its own, though it could be something to investigate. I suspect however there are other resource at play which may have contributed to the error.

I would like to note here that the configuration as shown is not a recommended pattern, as you are passing managed resources into a provider configuration. Operations on this type of configuration can often not be completed in a single apply and require breaking up the process using -target, which means it's better to manage the individual configuration layers as separate independent configurations.

jbardin avatar Feb 03 '22 16:02 jbardin

The trace logs from the operation causing the error can also help diagnose the problem. The azurerm provider is quite verbose, but we are only interested in the core logs here so setting TF_LOG_CORE=trace will get the graph building details we need. The output is still fairly large though, so it's better to create a separate gist with the content.

jbardin avatar Feb 03 '22 16:02 jbardin

here's the trace. I've cut away the state dumps containing secrets. Hope it's still helpful. I've also attached the graph as an svg.

https://gist.github.com/fnordian/903c683f7fbe86071fcb4995b680e7eb graph

fnordian avatar Feb 03 '22 16:02 fnordian

@fnordian, you appear to have dropped a character when copying the gist URL.

jbardin avatar Feb 03 '22 16:02 jbardin

sorry, fixed. (it was a b)

fnordian avatar Feb 03 '22 16:02 fnordian

Thanks @fnordian. I wasn't counting on there being a change to the azurerm_kubernetes_cluster resource itself, but that of course if why the node is present in this cycle after all.

The problem here is a form of what I described earlier, with the managed resources being used in the provider configuration preventing the entire config from being applied in a single operation. Unfortunately this results in a cycle during apply rather than being detected in a way that could be better presented to the user during plan, and one of the reasons this type of config is not recommended.

The cycle appears because the kubernetes_pod creation depends on the azurerm_kubernetes_cluster, hence the update to azurerm_kubernetes_cluster depends on the destruction of the old kubernetes_pod. Having the provider interposed between these two operations is what introduces the cycle.

I think a workaround here would be to apply a targeted change to azurerm_kubernetes_cluster to ensure it's not present in the dependency graph when the kubernetes resources need to be replaced.

jbardin avatar Feb 03 '22 19:02 jbardin

Using data in between resource and provider works around that problem.

Although now I am running into what seems to be https://github.com/hashicorp/terraform-provider-kubernetes/issues/1028

data "azurerm_kubernetes_cluster" "services" {
  name = var.cluster_name
  resource_group_name = var.resource_group_name
  depends_on = [azurerm_kubernetes_cluster.services]
}


provider "kubernetes" {
  host                   = data.azurerm_kubernetes_cluster.services.kube_config.0.host
  username               = data.azurerm_kubernetes_cluster.services.kube_config.0.username
  password               = data.azurerm_kubernetes_cluster.services.kube_config.0.password
  client_certificate     = base64decode(data.azurerm_kubernetes_cluster.services.kube_config.0.client_certificate)
  client_key             = base64decode(data.azurerm_kubernetes_cluster.services.kube_config.0.client_key)
  cluster_ca_certificate = base64decode(data.azurerm_kubernetes_cluster.services.kube_config.0.cluster_ca_certificate)
}


fnordian avatar Feb 04 '22 07:02 fnordian

Using the data source does avoid the cycle by disconnecting the direct relationship between the resources, but as you see because that relationship is no longer present, you are going to have ordering issues associated with having a managed resource and data source representing the same logical resource in the configuration.

This is not solvable within a single Terraform configuration, so we can use this issue to represent the situation and work on better error reporting to help direct users to working configurations. The fact that Terraform doesn't fail until apply, and only reports a hard to understand cycle is definitely a usability concern. The recommended solution is still going to be to use multiple independent configurations so that the lifecycle of the resources is not so closely tied together.

jbardin avatar Feb 04 '22 14:02 jbardin

Please note that they failure doesn't show on the initial apply, but only on the update.

As provider declarations allow references to other resources, it's hard to understand where the limits of terraform's dependency resolution lie. Could you point to documentation that explains it?

I can tell at least from a (my) user's perspective, it would be great, if terraform was able to handle these situations properly and not rely on the user for orchestration.

fnordian avatar Feb 09 '22 02:02 fnordian

For reference, this is documented in Provider Configuration

You can use expressions in the values of these configuration arguments, but can only reference values that are known before the configuration is applied. This means you can safely reference input variables, but not attributes exported by resources

Due to compatibility constraints we are not able to statically detect and error out on these types of references, but the management of these multi-layered configurations is something we're thinking about approaching via other means.

jbardin avatar Feb 09 '22 13:02 jbardin

I have the same problem. How to solve the problem? Thanks.

zhan16061 avatar Aug 10 '22 16:08 zhan16061

I'm running in the same issue.

It would be interesting to understand more the internals and why the providers can't support attributes exported by resources.

Also in reference to

the management of these multi-layered configurations is something we're thinking about approaching via other means.

What approach are you evaluating for this? Is there any roadmap / timeline?

The possibility to lazy load/configure providers dependent on dynamic information would improve a lot the user experience.

GiuseppeChiesa-TomTom avatar Oct 13 '22 17:10 GiuseppeChiesa-TomTom

Hi @GiuseppeChiesa-TomTom,

Most instances of this type of cycle should be fixed in a current release (or an upcoming release if it has been triggered by v1.3 specific changes).

The underlying problem with this setup is when a provider requires configuration to create a plan (common with providers like kubernetes), and that configuration depends on a resource attribute which is unknown during the plan. The only way around this is to separately apply the resource changes, then plan again using the dependent provider. With the given design of Terraform, planning and applying these individually is currently best done with separate configurations.

Unfortunately we don't have a public roadmap, but considering the experimental nature of any new approaches, it would be hard to offer a timeline.

jbardin avatar Oct 20 '22 21:10 jbardin

@fnordian I had a very similar issue with a managed resource in a provider config and v1.3.4 just fixed it

Mike-Nahmias avatar Nov 04 '22 17:11 Mike-Nahmias

Closing since the cycle errors should be resolvable in current releases. The logical problems of using a managed resource in a provider configuration still stand, but that is outside of this issue.

jbardin avatar Mar 11 '24 18:03 jbardin

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

github-actions[bot] avatar Apr 11 '24 02:04 github-actions[bot]