terraform-aws-msk-kafka-cluster icon indicating copy to clipboard operation
terraform-aws-msk-kafka-cluster copied to clipboard

Random number generator is not updating configuration resource name on create before destroy

Open dabmajor opened this issue 1 year ago β€’ 1 comments

Description

When upgrading an MSK cluster to a new version, the 2.5.0 version of this terraform module wants to replace the MSK configuration resource. The create before destroy is required because you cannot destroy the configuration for an active cluster. However, your random number generator does not always generate a unique name, so the create before destroy fails.

See this line of code in your module: https://github.com/terraform-aws-modules/terraform-aws-msk-kafka-cluster/blob/master/main.tf#L259

  • [ X ] βœ‹ I have searched the open/closed issues and my issue is not listed.

Versions

  • Module version: 2.5.0

  • Terraform version: 1.7.5

  • Provider version(s): 5.42.0

Expected behavior

Create before destroy should create a new aws_msk_configuration with a new/unique name using a random number generator, every single time it is ran.

Actual behavior

After the first successful recreation, the create before destroy on this resource creates a new aws_msk_configuration with the same random number generated on the first resource creation run. Causing terraform to fail.

Terminal Output

.
.
  # module.msk["REDACTED"].module.msk.aws_msk_configuration.this[0] must be replaced
+/- resource "aws_msk_configuration" "this" {
      ~ arn             = "arn:aws:kafka:REDACTED:REDACTED:configuration/REDACTED" -> (known after apply)
      ~ id              = "arn:aws:kafka:REDACTED:REDACTED:configuration/REDACTED" -> (known after apply)
      ~ kafka_versions  = [ # forces replacement
          - "3.4.0",
          + "3.5.1",
        ]
      ~ latest_revision = 1 -> (known after apply)
        name            = "REDACTED-123456789"        <-- notice the name is not changing
    }
.
.
.
Error: creating MSK Configuration: operation error Kafka: CreateConfiguration, https response error StatusCode: 409, RequestID: ********* ConflictException: A resource with this name already exists.

Possible Solutions

Random number generator was a great idea, but it only worked for the first recreation. You could ensure the value is always unique by doing something like this:

locals {
  config_as_string = join(",", [for key, value in var.configuration_server_properties : "${key}=${value}"])
  unique_config_suffix = substr(sha256("${local.config_as_string}${var.kafka_version}", -8, -1)
}

name = format("%s-%s", coalesce(var.configuration_name, var.name), local.unique_config_suffix)

Doing this, I believe any time either of these variables change, not only would it trigger the resource recreation, it would also be generating a completely unique 8 character value. I didn't test this, but I think something like this would work. You may also need to add one additional value to the hash, so you don't end up with hash collisions when two configs have the same name and the same configuration values.

Another option might be to put a replace_triggered_by on the random number generator, but I have never tried using this particular lifecycle block, so I am not 100% sure if this would would get triggered properly in a create_before_destroy scenario

resource "random_id" "this" {
  count = var.create && var.create_configuration ? 1 : 0

  byte_length = 8
  
  lifecycle {
    replace_triggered_by = [
      aws_msk_configuration.this
    ]
  }
}

dabmajor avatar Apr 10 '24 01:04 dabmajor

For anyone currently trying to upgrade their clusters that is block by this, you can still finish your MSK upgrade following these steps:

  1. create a temporary msk configuration with the new version (copied from the target cluster)
  2. manually upgrade your cluster using the temporary configuration
  3. delete the previously conflicting configuration
  4. re-run the failed terraform
  5. verify that the replacement configuration was created applied to the target cluster
  6. delete your temporary configuration

Note: this process is only correct if you are not using MSK replicator to another region.

dabmajor avatar Apr 10 '24 02:04 dabmajor

Same problem here. I wonder if the aws provider resource is wrong here. I upgraded via UI and it reused the same configuration. So why is kafka_versions a change that requires recreate?

hameno avatar May 23 '24 07:05 hameno

@GreggSchofield ☝🏽

bryantbiggs avatar May 23 '24 11:05 bryantbiggs

I have written a working solution in my own environment. Had to fork this repo to get it working. I agree that the provider itself likely could handle it differently, then we wouldn't have to code around it. I've also found MSK itself lacks some maturity, where api/cli limitations are manifesting themselves into what we perceive as a provider limitation.

On Thu, May 23, 2024, 5:49β€―AM Bryant Biggs @.***> wrote:

@GreggSchofield https://github.com/GreggSchofield ☝🏽

β€” Reply to this email directly, view it on GitHub https://github.com/terraform-aws-modules/terraform-aws-msk-kafka-cluster/issues/27#issuecomment-2126914106, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE54WQKBPQZYT5QSUZMN5HTZDXJUDAVCNFSM6AAAAABF7RJXM6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRWHEYTIMJQGY . You are receiving this because you authored the thread.Message ID: <terraform-aws-modules/terraform-aws-msk-kafka-cluster/issues/27/2126914106 @github.com>

dabmajor avatar May 23 '24 11:05 dabmajor

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days

github-actions[bot] avatar Jun 23 '24 00:06 github-actions[bot]

This issue was automatically closed because of stale in 10 days

github-actions[bot] avatar Jul 04 '24 00:07 github-actions[bot]

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

github-actions[bot] avatar Aug 03 '24 02:08 github-actions[bot]