Failure when adding or removing cold or frozen tiers
Readiness Checklist
- [x] I am running the latest version
- [x] I checked the documentation and found no answer
- [x] I checked to make sure that this issue has not already been filed - possibly related to https://github.com/elastic/terraform-provider-ec/issues/336 and https://github.com/elastic/terraform-provider-ec/issues/343, @Kushmaro asked to file an issue
- [x] I am reporting the issue to the correct repository (for multi-repository projects)
Expected Behavior
Add a cold or frozen tier to an existing deployment. Or removing from an existing one.
Current Behavior
We get the error:
│ Error: failed updating deployment: 3 errors occurred:
│ * api error: clusters.cluster_invalid_plan: Instance configuration [gcp.es.datacold.n2.68x10x190] does not allow usage of node types [master,ingest]. You must either change instance configuration or use only allowed node types [data]. (resources.elasticsearch[0].cluster_topology[2].instance_configuration_id)
│ * api error: deployments.elasticsearch.node_roles_error: Invalid node_roles configuration: The node_roles in the plan contains values not present in the template. [id = cold] (resources.elasticsearch[0])
│ * api error: deployments.elasticsearch.node_roles_error: Invalid node_roles configuration: The node_roles in the plan contains values not present in the template. [id = hot_content] (resources.elasticsearch[0])
│
│
│
│ with ec_deployment.multi_tier,
│ on deployment.tf line 17, in resource "ec_deployment" "multi_tier":
│ 17: resource "ec_deployment" "multi_tier" {
Even if using alphabetical order when defining the resources.
Steps to Reproduce
- Create a simple deployment using terraform with
terraform apply -auto-approve(it requires previously setting the EC Ap key withEC_API_KEY="<ESS_API_KEY>"andterraform init.
terraform {
required_version = ">= 0.12.29"
required_providers {
ec = {
source = "elastic/ec"
version = "0.4.0"
}
}
}
provider "ec" {}
# Create an Elastic Cloud deployment
resource "ec_deployment" "multi_tier" {
name = "multi_tier"
region = "gcp-europe-west3"
version = "7.17.1"
deployment_template_id = "gcp-storage-optimized"
elasticsearch {
autoscale = "false"
topology {
id = "hot_content"
size = "1g"
zone_count = 1
}
topology {
id = "warm"
zone_count = 1
size = "2g"
}
}
kibana {
topology {
size = "1g"
zone_count = 1
}
}
}
- We have a correct
terraform.state
"id": "hot_content",
"instance_configuration_id": "gcp.es.datahot.n2.68x10x45",
"node_roles": [
"data_content",
"data_hot",
"ingest",
"master",
"remote_cluster_client",
"transform"
]
....
"id": "warm",
"instance_configuration_id": "gcp.es.datawarm.n2.68x10x190",
"node_roles": [
"data_warm",
"remote_cluster_client"
]
- Change the resources above to add a cold tier and apply again
terraform apply -auto-approve
elasticsearch {
autoscale = "false"
topology {
id = "cold"
size = "4g"
zone_count = 1
}
topology {
id = "hot_content"
size = "1g"
zone_count = 1
}
topology {
id = "warm"
zone_count = 1
size = "2g"
}
}
- We'll get:
│ Error: failed updating deployment: 3 errors occurred:
│ * api error: clusters.cluster_invalid_plan: Instance configuration [gcp.es.datacold.n2.68x10x190] does not allow usage of node types [master,ingest]. You must either change instance configuration or use only allowed node types [data]. (resources.elasticsearch[0].cluster_topology[2].instance_configuration_id)
│ * api error: deployments.elasticsearch.node_roles_error: Invalid node_roles configuration: The node_roles in the plan contains values not present in the template. [id = cold] (resources.elasticsearch[0])
│ * api error: deployments.elasticsearch.node_roles_error: Invalid node_roles configuration: The node_roles in the plan contains values not present in the template. [id = hot_content] (resources.elasticsearch[0])
│
│
│
│ with ec_deployment.multi_tier,
│ on deployment.tf line 17, in resource "ec_deployment" "multi_tier":
│ 17: resource "ec_deployment" "multi_tier" {
- And in the
terrafom.statewe can see that it has mixed, thecoldid has hot attributes (instance configuration, node roles not allowed incoldlikeingestormaster). Thehot_contenthas warm attributes. And thewarmhas emptry attributes.
"id": "cold",
"instance_configuration_id": "gcp.es.datahot.n2.68x10x45",
"node_roles": [
"data_content",
"data_hot",
"ingest",
"master",
"remote_cluster_client",
"transform"
],
...
"id": "hot_content",
"instance_configuration_id": "gcp.es.datawarm.n2.68x10x190",
"node_roles": [
"data_warm",
"remote_cluster_client"
],
"id": "warm",
"instance_configuration_id": "",
Context
Trying to add a cold tier to a deployment that already has a hot and warm. Several combinations lead to this same error.
Possible Solution
We have found no solution/workaround so far. Once this is hit, we have to use the cloud UI to add or remove tiers, and then terraform apply -refresh-only.
Your Environment
- Version used: Terraform v1.1.7 on darwin_amd64 + provider registry.terraform.io/elastic/ec v0.4.0
- Running against Elastic Cloud SaaS or Elastic Cloud Enterprise and version: ESS, stack version 7.17.1
- Operating System and version: macOS Monterey 12.3.1
Hi guys, any updates on this issue? It feels I'm being stuck with the same bug issue
@Kushmaro @jaggederest any ideas when and how it can be fixed? It looks like a critical issue for me as it prevents using elastic major features. Also, @immavalls, have you maybe found any solutions or workaround for this issue since the ticket was opened in April and there is no updates since?
We are looking into this @AndriiLavrekha , but we can't provide any timelines as of yet.
@Kushmaro Thank you for the comment. Can you maybe also confirm that issue affects only 'cold' and 'frozen' topologies usage?
I can't @AndriiLavrekha , this needs further investigation to confirm or deny it affects only a single type of tier.
I think this is due to https://github.com/elastic/terraform-provider-ec/issues/336.
Even if you specify the blocks in alphabetical order things don't always work.
In my case the order in the state changes after running terraform refresh. I'm trying to find out where this happens, but had no luck so far.
The defect indeed is caused by the same logic and limitations that cause #336.
The possible workaround:
If autoscale disabled
Initial deployment creation
Topology elements (tiers) with non-zero sizes have to be listed in alphabetical order of their id fields.
Update - new tier adding
- add new tier in the end of the tolology list (that is already sorted by alphabetical order)
- run
terraform apply - reorder the topology list in alphabetical order
- check that there are no pending changes by
terraform plan- it should output empty diff
Update - removing existing tier
- set the tier's size to 0
- run
terraform apply - remove tier from the topology list
- check that there are no pending changes -
terraform planshould output empty diff
If autoscale enabled
The idea is the same but applies to all tiers that either have non-zero sizes or can be resized by auto scaling (it happens when a corresponding deployment template specifies non-zero autoscaling_max for the tier) - all these tiers should be listed in alphabetical order of their id fields, even if their blocks don't specify other fields beside id.
However, if the tier's size is zero and a corresponding deployment template doesn't specify autoscaling_max for the tier or its value is zero, the tier should be omitted from the topology list.
Also, make sure to ignore size attributes if you'd like to specify initial sizes for tiers - the sized can be changed later on by the autoscaler e.g. the snippet ignores updates of sizes of the 2nd and 4th entries of the topology list:
lifecycle {
ignore_changes = [
elasticsearch[0].topology[2].size,
elasticsearch[0].topology[4].size
]
}
Closed by https://github.com/elastic/terraform-provider-ec/pull/567