terraform-provider-rancher2 icon indicating copy to clipboard operation
terraform-provider-rancher2 copied to clipboard

Provider produced inconsistent final plan, produced an invalid new value for │ .rke_config[0].machine_pools[0].cloud_credential_secret_name: was cty.StringVal(""), but now │ cty.StringVal("cattle-global-data:cc-694ng").

Open andersjohansson2021 opened this issue 3 years ago • 10 comments

When terraforming a RKE2 cluster i receive the following: │ Error: Provider produced inconsistent final plan │ │ When expanding the plan for rancher2_cluster_v2.test-cluster to include new values learned so far during apply, provider │ "registry.terraform.io/rancher/rancher2" produced an invalid new value for │ .rke_config[0].machine_pools[1].cloud_credential_secret_name: was cty.StringVal(""), but now │ cty.StringVal("cattle-global-data:cc-trrz8"). │ │ This is a bug in the provider, which should be reported in the provider's own issue tracker.

This happends in version 1.22.1 not in 1.21.0 of the provider. /Anders.

SURE-5412

andersjohansson2021 avatar Jan 03 '22 10:01 andersjohansson2021

Any update on this? I'm seeing the same issue.

│ When expanding the plan for module.cluster.rancher2_cluster_v2.cluster to include new values learned so far
│ during apply, provider "registry.terraform.io/rancher/rancher2" produced an invalid new value for
│ .rke_config[0].machine_pools[0].cloud_credential_secret_name: was cty.StringVal(""), but now
│ cty.StringVal("cattle-global-data:cc-f9hbf").
│
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.

frouzbeh avatar Feb 16 '22 06:02 frouzbeh

@frouzbeh I had too use an older version as stated above. That solved the issue for me. But having said that this bug need to be adressed for the provider.

andersjohansson2021 avatar Feb 16 '22 07:02 andersjohansson2021

@andersjohansson2021 Thank you, yes I tested with older version and it works, but as I remember older version had another issue with kube-config generation function which is fixed in 1.22.2. I hope they fix it soon.

frouzbeh avatar Feb 16 '22 18:02 frouzbeh

@rawmind0 Would you please take a look at this issue. We really would like to make this work.

frouzbeh avatar Feb 16 '22 22:02 frouzbeh

+1 here as well

git-ival avatar May 06 '22 19:05 git-ival

Hello , Any workaround or fix this issue . I am stuck at this issue .

PrakashFromBunnings avatar Jun 09 '22 08:06 PrakashFromBunnings

@frouzbeh I had too use an older version as stated above. That solved the issue for me. But having said that this bug need to be adressed for the provider.

older version brings other issues , like missing or unsupported argumnets etc.

PrakashFromBunnings avatar Jun 09 '22 08:06 PrakashFromBunnings

Hello, this issue is also causing problems on my deploys. Is there a commitment to fix it ?

nfsouzaj avatar Aug 02 '22 13:08 nfsouzaj

Reproduced the issue on v2.6-head c54b655 cloud provider - Linode

  • I ran into the same issue when I tried to create an rke2 node driver cluster on k8s v1.23.10+rke2r1.
  • Works fine when we create an rke2 node driver on k8s v1.22.13+rke2r1 and rke1 node driver cluster on k8s v1.23.10-rancher1-1
  • From terraform script - created a cloud credential , add machine pool and machine_global_config in the rke_config while creating the cluster from terraform. Errors out:
Error: Provider produced inconsistent final plan
│ 
│ When expanding the plan for rancher2_cluster_v2.rke2-cluster-tf to include new values learned so far during apply, provider "registry.terraform.io/rancher/rancher2" produced an invalid new
│ value for .rke_config[0].machine_pools[0].cloud_credential_secret_name: was cty.StringVal(""), but now cty.StringVal("cattle-global-data:<redacted>").
│ 
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.
╵

anupama2501 avatar Sep 01 '22 11:09 anupama2501

As workaround I tried this and worked. Create a cloud credential and then grab the cloud credential id using a data block.

Example:

resource "rancher2_cloud_credential" "rancher2_cloud_credential" {
  name = var.cloud_credential_name
  amazonec2_credential_config {
    access_key = var.aws_access_key
    secret_key = var.aws_secret_key
    default_region = var.aws_region
  }
}
data "rancher2_cloud_credential" "rancher2_cloud_credential" {
  name = var.cloud_credential_name
}

Then use data.rancher2_cloud_credential.rancher2_cloud_credential.id in rancher2_cluster_v2 machine configs.

Note: this only work having the cloud credential created beforehand it seems

izaac avatar Sep 28 '22 16:09 izaac

Still in version 1.24.2 when creating RKE2 downstream clusters on Azure:

Error: Provider produced inconsistent final plan

When expanding the plan for rancher2_cluster_v2.cluster_az to include new values learned so far during apply, provider "registry.terraform.io/rancher/rancher2" produced an invalid new value for .rke_config[0].machine_pools[0].cloud_credential_secret_name: was cty.StringVal(""), but now cty.StringVal("cattle-global-data:cc-ffs8c").

This is a bug in the provider, which should be reported in the provider's own issue tracker.

Error: Provider produced inconsistent final plan

When expanding the plan for rancher2_cluster_v2.cluster_az to include new values learned so far during apply, provider "registry.terraform.io/rancher/rancher2" produced an invalid new value for .rke_config[0].machine_pools[0].name: was cty.StringVal(""), but now cty.StringVal("pool-b94345").

This is a bug in the provider, which should be reported in the provider's own issue tracker.

Looking at #878, I don't believe that it will fix both plan inconsistencies

chfrank-cgn avatar Oct 24 '22 10:10 chfrank-cgn

Still seeing this error on v1.25.0:

│ 
│ When expanding the plan for rancher2_cluster_v2.utility to include new values learned so far during apply, provider "registry.terraform.io/rancher/rancher2"
│ produced an invalid new value for .rke_config[0].machine_pools[0].cloud_credential_secret_name: was cty.StringVal(""), but now
│ cty.StringVal("cattle-global-data:cc-pmzs7").
│ 
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.

it usually works the second time

matttrach avatar Dec 20 '22 06:12 matttrach

Running terraform apply a second time also consistently works for me.

On the terraform destroy I also had a dependency issue but that could be fixed with adding:

depends_on = [rancher2_cloud_credential.my_cloud_credential]

to the rancher2_cluster_v2 resource.

Is there another depends_on-like or sleep-like thing you could do to get the apply working on the first try?

sebracs avatar Jan 05 '23 15:01 sebracs

Yes - I can confirm that it works the second time, most likely because the credential is already there from the first try. Thanks for the hint about the dependency!

chfrank-cgn avatar Jan 06 '23 15:01 chfrank-cgn

Facing this as well. Any plan to fix this bug?

As noted here, the only workaround is to create the cloud credential before running terraform apply - on a different apply or manually via UI Otherwise - my automation to create clusters is failing.

moshiaiz avatar Jan 06 '23 19:01 moshiaiz

Hello @moshiaiz,

I am working on this. Thank you all for your patience.

Terraform rancher2 provider with rancher 2.7 builds are currently blocked for us due to https://github.com/rancher/terraform-provider-rancher2/issues/1052. We need to branch and fix our build before I can reproduce this issue.

From my investigation, there is indeed a bug in the way the provider is processing the value for .rke_config[0].machine_pools[0].cloud_credential_secret_name. From the terraform docs, this field exists in both the cluster_v2 resource and its machine pool but offhand will need to find out why its present in the machine pool. When connecting to a rancher instance, there is only 1 cloud credential needed to connect to the instance so it may be a duplicate field.

This old PR is a potential fix https://github.com/rancher/terraform-provider-rancher2/pull/878 and should also fix https://github.com/rancher/terraform-provider-rancher2/issues/915 since rke is being installed on vSphere and this appears to be a bug in the terraform rke config.

a-blender avatar Jan 09 '23 21:01 a-blender

The main PR has been merged for https://github.com/rancher/terraform-provider-rancher2/issues/1052 and the TF build is fixed. Testing is unblocked. Trying to reproduce this for an RKE2 cluster on provider version 1.25.0

a-blender avatar Jan 26 '23 15:01 a-blender

Reproduced this issue on Amazon EC2 RKE2 cluster with TF provider 1.25.0.

main.tf

terraform {
  required_providers {
    rancher2 = {
      source  = "rancher/rancher2"
      version = "1.25.0"
    }
  }
}

provider "rancher2" {
  api_url   = var.rancher_api_url 
  token_key = var.rancher_admin_bearer_token
  insecure  = true
}

# Create amazonec2 cloud credential
resource "rancher2_cloud_credential" "foo" {
  name = "foo"
  amazonec2_credential_config {
    access_key = var.aws_access_key
    secret_key = var.aws_secret_key
  }
}

# Create amazonec2 machine config v2
resource "rancher2_machine_config_v2" "foo" {
  generate_name = "ablender-machine"
  amazonec2_config {
    ami            = var.aws_ami
    region         = var.aws_region
    security_group = [var.aws_security_group_name]
    subnet_id      = var.aws_subnet_id
    vpc_id         = var.aws_vpc_id
    zone           = var.aws_zone_letter
  }
}

# Create a new rancher v2 amazonec2 RKE2 Cluster v2
resource "rancher2_cluster_v2" "ablender-rke2" {
  name = var.rke2_cluster_name
  kubernetes_version = "v1.25.6-rancher1-1"
  enable_network_policy = false
  default_cluster_role_for_project_members = "user"
  cloud_credential_secret_name = rancher2_cloud_credential.foo.id
  rke_config {
    machine_pools {
      name = "pool1"
      cloud_credential_secret_name = rancher2_cloud_credential.foo.id
      control_plane_role = true
      etcd_role = true
      worker_role = true
      quantity = 1
      machine_config {
        kind = rancher2_machine_config_v2.foo.kind
        name = rancher2_machine_config_v2.foo.name
      }
    }
  }
}

The error showed up on the first terraform apply

image

It worked when running terraform apply a second time as posted above so that is a valid workaround.

image

a-blender avatar Feb 07 '23 15:02 a-blender

Investigation

After more digging, I've discovered that this error is/similar to a very popular error https://github.com/hashicorp/terraform-provider-aws/issues/19583 in the Terraform provider AWS that has been very active over the past two years and that Hashicorp refuses to acknowledge or fix.

I discovered this error in the TF debug logs

2023-02-10T13:43:54.804-0500 [WARN]  Provider "terraform.example.com/local/rancher2" produced an invalid plan for rancher2_cluster_v2.ablender-rke2, but we are tolerating it because it is using the legacy plugin SDK.
    The following problems may be the cause of any confusing errors from downstream operations:
      - .fleet_namespace: planned value cty.StringVal("fleet-default") for a non-computed attribute
      - .rke_config[0].machine_selector_config: attribute representing nested block must not be unknown itself; set nested attribute values to unknown instead
      - .rke_config[0].etcd: attribute representing nested block must not be unknown itself; set nested attribute values to unknown instead
      - .rke_config[0].machine_pools[0].cloud_credential_secret_name: planned value cty.StringVal("") does not match config value cty.UnknownVal(cty.String)

From poking around and according to Hashicorp https://discuss.hashicorp.com/t/context-around-the-log-entry-tolerating-it-because-it-is-using-the-legacy-plugin-sdk/1630, most of these warnings are due to an expected SDK compatibility quirk but the error for cloud_credential_secret_name is causing the apply to fail.

Full error ends in this

2023-02-10T13:26:42.997-0500 [ERROR] vertex "rancher2_cluster_v2.ablender-rke2" error: Provider produced inconsistent final plan
╷
│ Error: Provider produced inconsistent final plan
│ 
│ When expanding the plan for rancher2_cluster_v2.ablender-rke2 to include new values learned so far during apply,
│ provider "terraform.example.com/local/rancher2" produced an invalid new value for
│ .rke_config[0].machine_pools[0].cloud_credential_secret_name: was cty.StringVal(""), but now
│ cty.StringVal("cattle-global-data:cc-mzjcm").
│ 
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.

Root cause

Something on the backend in the terraform-plugin-sdk is computing a planned value of "" for cloud_credential_secret_name when it is set to Required and set as a string in the config file. This cannot be fixed in the Terraform provider. It appears to be a bug in the sdk that the provider is using.

Fix

I tried updating the Terraform plugin SDK and that did not work, but setting machine pool cloud_credential_secret_name as Optional does fix it. This patch allows us to retain parity between Rancher and the Terraform provider and may be the most viable option to fix this issue for the scores of customers who have been running into this issue every few weeks. I will update my draft PR shortly.

a-blender avatar Feb 10 '23 23:02 a-blender

Testing template

Root cause

When creating an RKE2 cluster via Terraform on any hosted provider (Amazon EC2, Azure, Linode driver so far), Terraform computes a new value for a duplicate field cloud_credential_secret_name in the machine pool and then throws an error on a terraform apply pertaining to that value.

What was fixed, or what changes have occurred

This PR has the following fix

  • Update machine_pool.cloud_credential_secret_name to be Optional. This keeps parity with Terraform and fixes the plan bug
  • Update docs
  • Update Terraform plugin SDK to 1.17.2

Areas or cases that should be tested

Test steps

  • Run rancher instance of v2.7-head
  • Provision an rke1 cluster on AWS EC2 nodes with all 3 permutations of rancher2_cluster_v2.cloud_credential_secret_name and rancher2_cluster_v2.rke_config.machine_pools.cloud_credential_secret_name set in main.tf (each one set, then both)
  • terraform init
  • terraform apply
  • Verify each cluster creates on the first run of terraform apply and provisions successfully
main.tf

terraform {
  required_providers {
    rancher2 = {
      source = "rancher/rancher2"
      version = "3.0.0"
    }
  }
}

provider "rancher2" {
  api_url   = var.rancher_api_url 
  token_key = var.rancher_admin_bearer_token
  insecure  = true
}

# Create amazonec2 cloud credential
resource "rancher2_cloud_credential" "foo" {
  name = "foo"
  amazonec2_credential_config {
    access_key = var.aws_access_key
    secret_key = var.aws_secret_key
  }
}

# Create amazonec2 machine config v2
resource "rancher2_machine_config_v2" "foo" {
  generate_name = "ablender-machine"
  amazonec2_config {
    ami            = var.aws_ami
    region         = var.aws_region
    security_group = [var.aws_security_group_name]
    subnet_id      = var.aws_subnet_id
    vpc_id         = var.aws_vpc_id
    zone           = var.aws_zone_letter
    root_size      = var.aws_root_size
  }
}

# Create a new rancher v2 amazonec2 RKE2 Cluster v2
resource "rancher2_cluster_v2" "ablender-rke2" {
  name = var.rke2_cluster_name
  cloud_credential_secret_name = rancher2_cloud_credential.foo.id // test case
  kubernetes_version = "v1.25.6+rke2r1"
  enable_network_policy = false
  default_cluster_role_for_project_members = "user"
  rke_config {
    machine_pools {
      name = "pool1"
      cloud_credential_secret_name = rancher2_cloud_credential.foo.id // test case
      control_plane_role = true
      etcd_role = true
      worker_role = true
      quantity = 1
      machine_config {
        kind = rancher2_machine_config_v2.foo.kind
        name = rancher2_machine_config_v2.foo.name
      }
    }
  }
}

What areas could experience regressions ?

Terraform rancher2 provider, rke1 prov

Are the repro steps accurate/minimal ?

Yes.

a-blender avatar Feb 17 '23 21:02 a-blender

Blocked -- waiting on Terraform 3.0.0 for Rancher v2.7.x.

a-blender avatar Feb 17 '23 21:02 a-blender

Thank you for the investigations so far, I tested 3.0.0-rc1 since I have the same problem, were first apply fails and second apply works.

In my case I could track it down to machine_global_config being built up with a "known after apply" value.

  rke_config {
    machine_global_config = yamlencode({
      cni = "calico"
      profile = "cis-1.6"
      tls-san = [
        module.vip_control_plane.fqdn,
      ]
    })
  }

If I remove the tls-san value, the problem doesn't happen on first try.

Anything I could test or investigate?

2023-02-28T16:43:00.253+0100 [WARN]  Provider "local/rancher/rancher2" produced an invalid plan for module.cluster.rancher2_cluster_v2.cluster, but we are tolerating it because it is using the legacy plugin SDK.
    The following problems may be the cause of any confusing errors from downstream operations:
      - .fleet_namespace: planned value cty.StringVal("fleet-default") for a non-computed attribute
      - .rke_config[0].machine_global_config: planned value cty.StringVal("cni: calico\nprofile: cis-1.6\ntls-san:\n- generated-fqdn.int.example.com\n") does not match config value cty.StringVal("\"cni\": \"calico\"\n\"profile\": \"cis-1.6\"\n\"tls-san\":\n- \"generated-fqdn.int.example.com\"\n")
      - .rke_config[0].machine_pools: attribute representing nested block must not be unknown itself; set nested attribute values to unknown instead
      - .rke_config[0].etcd: attribute representing nested block must not be unknown itself; set nested attribute values to unknown instead
      - .rke_config[0].machine_selector_config: attribute representing nested block must not be unknown itself; set nested attribute values to unknown instead
2023-02-28T16:43:00.253+0100 [ERROR] vertex "module.cluster.rancher2_cluster_v2.cluster" error: Provider produced inconsistent final plan
╷
│ Error: Provider produced inconsistent final plan
│ 
│ When expanding the plan for module.cluster.rancher2_cluster_v2.cluster to
│ include new values learned so far during apply, provider
│ "local/rancher/rancher2" produced an invalid new value for .rke_config:
│ block count changed from 0 to 1.
│ 
│ This is a bug in the provider, which should be reported in the provider's
│ own issue tracker.

lazyfrosch avatar Feb 28 '23 15:02 lazyfrosch

@sowmyav27 This is ready to test using Terraform rancher2 v3.0.0-rc1. Please setup local testing on the rc version of the provider with this command

./setup-provider.sh rancher2 3.0.0-rc1

a-blender avatar Mar 02 '23 23:03 a-blender

@a-blender shall I open a dedicated issue? But I assume this is a general problem for "known during apply" values.

lazyfrosch avatar Mar 03 '23 11:03 lazyfrosch

Ticket #835 - Test Results - ✅

With Docker on a single-node instance using Rancher v2.7-64c5188a5394f7ef7858ebb6807072ad5abe0e80-head:

Verified with rancher2 provider v3.0.0-rc2:

  1. Fresh install of rancher v2.7-head
  2. Configure a resource block for cloud credentials + provision a downstream RKE2 EC2 cluster, referencing the cloud credential resource in the nodepool configuration
  3. Verified - no errors seen; cluster successfully provisions and destroys; as expected

Screenshots: Screenshot 2023-04-11 at 3 51 21 PM

Screenshot 2023-04-11 at 3 51 37 PM

Josh-Diamond avatar Apr 11 '23 22:04 Josh-Diamond