terraform-provider-spotinst
terraform-provider-spotinst copied to clipboard
Cluster roll failure when 2 or more VNGs are updated at once
Description
Hello,
We have 2 VNGs (spotinst_ocean_aws_launch_spec
) that have should_roll
feature enabled (in order to automate cluster/VNG roll when configuration changes).
When updating two VNGs at once in 1 terraform apply (for example AMI ID change), terraform fails with an error "Can't have 2 Rolls at the same time. Please stop the previous one". This is one of the reasons why we had to stop using VNGs for now and only use the default VNG to avoid this problem..
Terraform Version
1.3.9
Affected Resource(s)
spotinst_ocean_aws_launch_spec
Terraform Configuration Files
module "ocean-aws-k8s-vng_stateless" {
source = "spotinst/ocean-aws-k8s-vng/spotinst"
name = "stateless-group" # Name of VNG in Ocean
ocean_id = local.ocean_id
image_id = "ami-07bccaac087171156"
labels = [{key="type",value="stateless"}]
spot_percentage = 100 # Change the spot %
should_roll = true
}
## Create additional Ocean Virtual Node Group (launchspec) ##
module "ocean-aws-k8s-vng_stateful" {
source = "spotinst/ocean-aws-k8s-vng/spotinst"
name = "stateful-group" # Name of VNG in Ocean
ocean_id = local.ocean_id
image_id = "ami-07bccaac087171156"
labels = [{key="type",value="stateful"}]
taints = [{key="type",value="stateful",effect="NoSchedule"}]
spot_percentage = 0
#instance_types = ["g4dn.xlarge","g4dn.2xlarge"] # Limit VNG to specific instance types
should_roll = true
}
Debug Output
deployment/191/default/spotio": exit status 1
Dynamic environment variables added:
_PASS
module.ocean-aws-k8s-vng_stateless.spotinst_ocean_aws_launch_spec.nodegroup: Modifying... [id=ols-*******1]
module.ocean-aws-k8s-vng_stateful.spotinst_ocean_aws_launch_spec.nodegroup: Modifying... [id=ols-*******2]
module.ocean-aws-k8s-vng_stateful.spotinst_ocean_aws_launch_spec.nodegroup: Modifications complete after 1s [id=ols-*******2]
╷
│ Error: onRoll() -> Roll failed for cluster [ols-*******1], error: POST https://api.spotinst.io/ocean/aws/k8s/cluster/ols-*******1/roll?accountId=act-******: 400 (request: "32217267-9bdb-463a-ad6b-fc1440a6018a") CLUSTER_ROLL_ALREADY_IN_PROGRESS: Can't have 2 Rolls at the same time. Please stop the previous one.
│
│
│ with module.ocean-aws-k8s-vng_stateless.spotinst_ocean_aws_launch_spec.nodegroup,
│ on .terraform/modules/ocean-aws-k8s-vng_stateless/main.tf line 2, in resource "spotinst_ocean_aws_launch_spec" "nodegroup":
│ 2: resource "spotinst_ocean_aws_launch_spec" "nodegroup" {
│
Expected Behavior
Terraform shouldn't crash with an error. Cluster roll either needs to complete just once, applying changes to both VNGs, or VNGs need to roll independently at the same time.
Actual Behavior
Terraform crashes with the error "Can't have 2 Rolls at the same time" and fails to roll/apply changes to one of the VNGs.
Steps to Reproduce
- Create 2 VNGs using
spotinst/ocean-aws-k8s-vng/spotinst
module withshould_roll = true
. - Update
image_id
to a different image - terraform apply
@dmitrykruglov I got the same issue when trying to upgrade multiple VNGs at once, and i believe it needs to be fixed or well documented in the provider Terraform docs.
If you want to rollout more than one VNG at the same time, you should do that from the Ocean cluster level (example below):
resource "spotinst_ocean_aws" "ocean_cluster" {
count = ..........
name = ..........
controller_id = ..........
region = ..........
image_id = ..........
iam_instance_profile = ..........
desired_capacity = ..........
min_size = ..........
max_size = ..........
security_groups = []
subnet_ids = ..........
key_name = ..........
update_policy {
should_roll = true
conditioned_roll = true|false
auto_apply_tags = true
roll_config {
batch_size_percentage = 33
launch_spec_ids = ["ols-a0b****1", "ols-a0b****1"]
batch_min_healthy_percentage = 20
respect_pdb = true
}
}
autoscaler {}
}
I managed to test this and it works perfectly fine for a list of VNGs.
The ocean_cluster documentation has the details for the configuration: https://registry.terraform.io/providers/spotinst/spotinst/latest/docs/resources/ocean_aws#update-policy
Hi @dmitrykruglov The error you encountered while updating 2 vngs is intended. In order to update 2 or more vngs you can configure "update_policy" in cluster config and can pass list of vng_ids as shown in snippet below.
update_policy { should_roll = true roll_config { batch_size_percentage = 33 launch_spec_ids = ["ols-a0b1", "ols-a0b1"] batch_min_healthy_percentage = 20 respect_pdb = true } }