terraform-provider-spotinst Cluster roll failure when 2 or more VNGs are updated at once

Description

Hello,

We have 2 VNGs (spotinst_ocean_aws_launch_spec) that have should_roll feature enabled (in order to automate cluster/VNG roll when configuration changes). When updating two VNGs at once in 1 terraform apply (for example AMI ID change), terraform fails with an error "Can't have 2 Rolls at the same time. Please stop the previous one". This is one of the reasons why we had to stop using VNGs for now and only use the default VNG to avoid this problem..

Terraform Version

1.3.9

Affected Resource(s)

spotinst_ocean_aws_launch_spec

Terraform Configuration Files

module "ocean-aws-k8s-vng_stateless" {
   source = "spotinst/ocean-aws-k8s-vng/spotinst"

   name = "stateless-group" # Name of VNG in Ocean
   ocean_id = local.ocean_id

   image_id = "ami-07bccaac087171156"
   labels = [{key="type",value="stateless"}]
   spot_percentage = 100 # Change the spot %

   should_roll = true
 }

 ## Create additional Ocean Virtual Node Group (launchspec) ##
 module "ocean-aws-k8s-vng_stateful" {
   source = "spotinst/ocean-aws-k8s-vng/spotinst"

   name = "stateful-group"  # Name of VNG in Ocean
   ocean_id = local.ocean_id

   image_id = "ami-07bccaac087171156"
   labels = [{key="type",value="stateful"}]
   taints = [{key="type",value="stateful",effect="NoSchedule"}]
   spot_percentage = 0
   #instance_types = ["g4dn.xlarge","g4dn.2xlarge"] # Limit VNG to specific instance types

   should_roll = true
 }

Debug Output

deployment/191/default/spotio": exit status 1
Dynamic environment variables added:
_PASS

module.ocean-aws-k8s-vng_stateless.spotinst_ocean_aws_launch_spec.nodegroup: Modifying... [id=ols-*******1]
module.ocean-aws-k8s-vng_stateful.spotinst_ocean_aws_launch_spec.nodegroup: Modifying... [id=ols-*******2]
module.ocean-aws-k8s-vng_stateful.spotinst_ocean_aws_launch_spec.nodegroup: Modifications complete after 1s [id=ols-*******2]
╷
│ Error: onRoll() -> Roll failed for cluster [ols-*******1], error: POST https://api.spotinst.io/ocean/aws/k8s/cluster/ols-*******1/roll?accountId=act-******: 400 (request: "32217267-9bdb-463a-ad6b-fc1440a6018a") CLUSTER_ROLL_ALREADY_IN_PROGRESS: Can't have 2 Rolls at the same time. Please stop the previous one.
│ 
│ 
│   with module.ocean-aws-k8s-vng_stateless.spotinst_ocean_aws_launch_spec.nodegroup,
│   on .terraform/modules/ocean-aws-k8s-vng_stateless/main.tf line 2, in resource "spotinst_ocean_aws_launch_spec" "nodegroup":
│    2: resource "spotinst_ocean_aws_launch_spec" "nodegroup" {
│

Expected Behavior

Terraform shouldn't crash with an error. Cluster roll either needs to complete just once, applying changes to both VNGs, or VNGs need to roll independently at the same time.

Actual Behavior

Terraform crashes with the error "Can't have 2 Rolls at the same time" and fails to roll/apply changes to one of the VNGs.

Steps to Reproduce

Create 2 VNGs using spotinst/ocean-aws-k8s-vng/spotinst module with should_roll = true.
Update image_id to a different image
terraform apply

Jul 14 '23 19:07 dmitrykruglov

@dmitrykruglov I got the same issue when trying to upgrade multiple VNGs at once, and i believe it needs to be fixed or well documented in the provider Terraform docs.

If you want to rollout more than one VNG at the same time, you should do that from the Ocean cluster level (example below):

resource "spotinst_ocean_aws" "ocean_cluster" {
  count                = ..........
  name                 = ..........
  controller_id        = ..........
  region               = ..........
  image_id             = ..........
  iam_instance_profile = ..........
  desired_capacity = ..........
  min_size         = ..........
  max_size         = ..........
  security_groups = []
  subnet_ids           = ..........
  key_name             = ..........
  
  update_policy {
    should_roll      = true 
    conditioned_roll = true|false
    auto_apply_tags  = true

    roll_config {
      batch_size_percentage        = 33
      launch_spec_ids              = ["ols-a0b****1", "ols-a0b****1"]
      batch_min_healthy_percentage = 20
      respect_pdb                  = true
    }
  }
   
  autoscaler {}
}

I managed to test this and it works perfectly fine for a list of VNGs.

The ocean_cluster documentation has the details for the configuration: https://registry.terraform.io/providers/spotinst/spotinst/latest/docs/resources/ocean_aws#update-policy

Sep 01 '23 15:09 ilijad1

Hi @dmitrykruglov The error you encountered while updating 2 vngs is intended. In order to update 2 or more vngs you can configure "update_policy" in cluster config and can pass list of vng_ids as shown in snippet below.

update_policy { should_roll = true roll_config { batch_size_percentage = 33 launch_spec_ids = ["ols-a0b1", "ols-a0b1"] batch_min_healthy_percentage = 20 respect_pdb = true } }

Apr 16 '24 10:04 sharadkesarwani

terraform-provider-spotinst terraform-provider-spotinst copied to clipboard

Cluster roll failure when 2 or more VNGs are updated at once

Description

Terraform Version

Affected Resource(s)

Terraform Configuration Files

Debug Output

Expected Behavior

Actual Behavior

Steps to Reproduce

terraform-provider-spotinst
terraform-provider-spotinst copied to clipboard