terraform-provider-ec Failure when adding or removing cold or frozen tiers

Readiness Checklist

[x] I am running the latest version
[x] I checked the documentation and found no answer
[x] I checked to make sure that this issue has not already been filed - possibly related to https://github.com/elastic/terraform-provider-ec/issues/336 and https://github.com/elastic/terraform-provider-ec/issues/343, @Kushmaro asked to file an issue
[x] I am reporting the issue to the correct repository (for multi-repository projects)

Expected Behavior

Add a cold or frozen tier to an existing deployment. Or removing from an existing one.

Current Behavior

We get the error:

│ Error: failed updating deployment: 3 errors occurred:
│       * api error: clusters.cluster_invalid_plan: Instance configuration [gcp.es.datacold.n2.68x10x190] does not allow usage of node types [master,ingest]. You must either change instance configuration or use only allowed node types [data]. (resources.elasticsearch[0].cluster_topology[2].instance_configuration_id)
│       * api error: deployments.elasticsearch.node_roles_error: Invalid node_roles configuration: The node_roles in the plan contains values not present in the template. [id = cold] (resources.elasticsearch[0])
│       * api error: deployments.elasticsearch.node_roles_error: Invalid node_roles configuration: The node_roles in the plan contains values not present in the template. [id = hot_content] (resources.elasticsearch[0])
│ 
│ 
│ 
│   with ec_deployment.multi_tier,
│   on deployment.tf line 17, in resource "ec_deployment" "multi_tier":
│   17: resource "ec_deployment" "multi_tier" {

Even if using alphabetical order when defining the resources.

Steps to Reproduce

Create a simple deployment using terraform with terraform apply -auto-approve (it requires previously setting the EC Ap key with EC_API_KEY="<ESS_API_KEY>" and terraform init.

terraform {
  required_version = ">= 0.12.29"

  required_providers {
    ec = {
      source  = "elastic/ec"
      version = "0.4.0"
    }
  }
}

provider "ec" {}

# Create an Elastic Cloud deployment
resource "ec_deployment" "multi_tier" {
  name = "multi_tier"

  region                 = "gcp-europe-west3"
  version                = "7.17.1"
  deployment_template_id = "gcp-storage-optimized"

  elasticsearch {
    autoscale = "false"

    topology {
      id         = "hot_content"
      size       = "1g"
      zone_count = 1
    }
   topology {
      id         = "warm"
      zone_count = 1
      size       = "2g"
    }
  }

  kibana {
    topology {
      size               = "1g"
      zone_count         = 1
    }
  }
}

We have a correct terraform.state

                    "id": "hot_content",
                    "instance_configuration_id": "gcp.es.datahot.n2.68x10x45",
                    "node_roles": [
                      "data_content",
                      "data_hot",
                      "ingest",
                      "master",
                      "remote_cluster_client",
                      "transform"
                    ]
                ....
                   
                    "id": "warm",
                    "instance_configuration_id": "gcp.es.datawarm.n2.68x10x190",
                    "node_roles": [
                      "data_warm",
                      "remote_cluster_client"
                    ]

Change the resources above to add a cold tier and apply again terraform apply -auto-approve

elasticsearch {
    autoscale = "false"

    topology {
      id         = "cold"
      size       = "4g"
      zone_count = 1
    }
   topology {
      id         = "hot_content"
      size       = "1g"
      zone_count = 1
    }
   topology {
      id         = "warm"
      zone_count = 1
      size       = "2g"
    }
  }

We'll get:

│ Error: failed updating deployment: 3 errors occurred:
│       * api error: clusters.cluster_invalid_plan: Instance configuration [gcp.es.datacold.n2.68x10x190] does not allow usage of node types [master,ingest]. You must either change instance configuration or use only allowed node types [data]. (resources.elasticsearch[0].cluster_topology[2].instance_configuration_id)
│       * api error: deployments.elasticsearch.node_roles_error: Invalid node_roles configuration: The node_roles in the plan contains values not present in the template. [id = cold] (resources.elasticsearch[0])
│       * api error: deployments.elasticsearch.node_roles_error: Invalid node_roles configuration: The node_roles in the plan contains values not present in the template. [id = hot_content] (resources.elasticsearch[0])
│ 
│ 
│ 
│   with ec_deployment.multi_tier,
│   on deployment.tf line 17, in resource "ec_deployment" "multi_tier":
│   17: resource "ec_deployment" "multi_tier" {

And in the terrafom.state we can see that it has mixed, the cold id has hot attributes (instance configuration, node roles not allowed incold like ingest or master). The hot_content has warm attributes. And the warm has emptry attributes.

                    "id": "cold",
                    "instance_configuration_id": "gcp.es.datahot.n2.68x10x45",
                    "node_roles": [
                      "data_content",
                      "data_hot",
                      "ingest",
                      "master",
                      "remote_cluster_client",
                      "transform"
                    ],

... 

                    "id": "hot_content",
                    "instance_configuration_id": "gcp.es.datawarm.n2.68x10x190",
                    "node_roles": [
                      "data_warm",
                      "remote_cluster_client"
                    ],
                    "id": "warm",
                    "instance_configuration_id": "",

Context

Trying to add a cold tier to a deployment that already has a hot and warm. Several combinations lead to this same error.

Possible Solution

We have found no solution/workaround so far. Once this is hit, we have to use the cloud UI to add or remove tiers, and then terraform apply -refresh-only.

Your Environment

Version used: Terraform v1.1.7 on darwin_amd64 + provider registry.terraform.io/elastic/ec v0.4.0
Running against Elastic Cloud SaaS or Elastic Cloud Enterprise and version: ESS, stack version 7.17.1
Operating System and version: macOS Monterey 12.3.1

Apr 12 '22 16:04 immavalls

Hi guys, any updates on this issue? It feels I'm being stuck with the same bug issue

Aug 04 '22 15:08 AndriiLavrekha

@Kushmaro @jaggederest any ideas when and how it can be fixed? It looks like a critical issue for me as it prevents using elastic major features. Also, @immavalls, have you maybe found any solutions or workaround for this issue since the ticket was opened in April and there is no updates since?

Aug 05 '22 10:08 AndriiLavrekha

We are looking into this @AndriiLavrekha , but we can't provide any timelines as of yet.

Aug 08 '22 09:08 Kushmaro

@Kushmaro Thank you for the comment. Can you maybe also confirm that issue affects only 'cold' and 'frozen' topologies usage?

Aug 08 '22 10:08 AndriiLavrekha

I can't @AndriiLavrekha , this needs further investigation to confirm or deny it affects only a single type of tier.

Aug 08 '22 11:08 Kushmaro

I think this is due to https://github.com/elastic/terraform-provider-ec/issues/336.

Even if you specify the blocks in alphabetical order things don't always work.

In my case the order in the state changes after running terraform refresh. I'm trying to find out where this happens, but had no luck so far.

Aug 09 '22 07:08 pascal-hofmann

The defect indeed is caused by the same logic and limitations that cause #336.

The possible workaround:

If autoscale disabled

Initial deployment creation

Topology elements (tiers) with non-zero sizes have to be listed in alphabetical order of their id fields.

Update - new tier adding

add new tier in the end of the tolology list (that is already sorted by alphabetical order)
run terraform apply
reorder the topology list in alphabetical order
check that there are no pending changes by terraform plan - it should output empty diff

Update - removing existing tier

set the tier's size to 0
run terraform apply
remove tier from the topology list
check that there are no pending changes - terraform plan should output empty diff

If autoscale enabled

The idea is the same but applies to all tiers that either have non-zero sizes or can be resized by auto scaling (it happens when a corresponding deployment template specifies non-zero autoscaling_max for the tier) - all these tiers should be listed in alphabetical order of their id fields, even if their blocks don't specify other fields beside id.

However, if the tier's size is zero and a corresponding deployment template doesn't specify autoscaling_max for the tier or its value is zero, the tier should be omitted from the topology list.

Also, make sure to ignore size attributes if you'd like to specify initial sizes for tiers - the sized can be changed later on by the autoscaler e.g. the snippet ignores updates of sizes of the 2nd and 4th entries of the topology list:

  lifecycle {
    ignore_changes = [
      elasticsearch[0].topology[2].size,
      elasticsearch[0].topology[4].size
    ]
  }

Sep 05 '22 12:09 dimuon

Closed by https://github.com/elastic/terraform-provider-ec/pull/567

Mar 01 '23 15:03 dimuon