terraform-provider-aws aws_msk_scram_secret_association overrides/removes existing secret associations

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform CLI and Terraform AWS Provider Version

> terraform -v 

Terraform v0.13.5 
+ provider registry.terraform.io/hashicorp/aws v3.21.0

Affected Resource(s)

aws_msk_scram_secret_association

Terraform Configuration Files

Please include all Terraform configurations required to reproduce the bug. Bug reports without a functional reproduction may be closed without investigation.

resource "aws_msk_scram_secret_association" "attach_to_cluster" {
  cluster_arn     = 'my-cluster'
  secret_arn_list = [arn:aws:secretsmanager:us-west-1:XXX:secret:AmazonMSK_Tf-Module_Test_20201216135753890700000001-ybcWM3]
}

Expected Behavior

Existing resources that are not defined in this terraform stack (or in another stack) should not be removed. This is needed to avoid a centralized management of secrets associated to a MSK cluster.

  # aws_msk_scram_secret_association.attach_to_cluster will be updated in-place
  ~ resource "aws_msk_scram_secret_association" "attach_to_cluster" {
        cluster_arn     = "arn:aws:kafka:us-west-1:XXX:cluster/my-custer/9999-9999-9999-9"
        id              = "arn:aws:kafka:us-west-1:XXX:cluster/my-custer/9999-9999-9999-9"
      ~ secret_arn_list = [
           + "arn:aws:secretsmanager:us-west-1:XXX:secret:AmazonMSK_Tf-Module_Test_20201216135753890700000001-ybcWM3",
        ]
    }

Actual Behavior

Existing secrets are removed from the MSK Cluster in the plan:

  # aws_msk_scram_secret_association.attach_to_cluster will be updated in-place
  ~ resource "aws_msk_scram_secret_association" "attach_to_cluster" {
        cluster_arn     = "arn:aws:kafka:us-west-1:XXX:cluster/my-custer/9999-9999-9999-9"
        id              = "arn:aws:kafka:us-west-1:XXX:cluster/my-custer/9999-9999-9999-9"
      ~ secret_arn_list = [
          - "arn:aws:secretsmanager:us-west-1:XXX:secret:AmazonMSK_ST_Akhq_20201013111302148600000001-ATFaph",
          - "arn:aws:secretsmanager:us-west-1:XXX:secret:AmazonMSK_ST_Schema-Registry_20201013111302148900000002-dTOIjg",
            "arn:aws:secretsmanager:us-west-1:XXX:secret:AmazonMSK_Tf-Module_Test_20201216135753890700000001-ybcWM3",
        ]
    }

Looks like the provider is trying to remove all the secrets that are not defined in this aws_msk_scram_secret_association.

Steps to Reproduce

Create a MSK cluster with some secrets associated on it in a first terraform stack
Create another terraform stack that associate new secrets to new previously created MSK cluster
terraform apply
terraform plan

References

Pull request:

#15302

Dec 16 '20 17:12 pandarouxbsd

I experienced this when I submitted my original PR, it is because the API will always return all the associated secrets, regardless of how they were associated. With most terraform providers, you would normally read the remote resource and store all the resources returned, but this makes a huge assumption of how you manage resources, i.e all through terraform. Here is the code - https://github.com/hashicorp/terraform-provider-aws/pull/15302/files#diff-2836c697ab96fb1c60919893bfb0dfd0405fb9138d7d9640375ea3fc8686ae81R69

The way that I got around this was to filter the secrets based on the inputs and the response for the API in my original PR - https://github.com/dblooman/terraform-provider-aws/blob/58e10f98553d811047dd6f33b1c1d01359814a46/aws/resource_aws_msk_scram_secret.go#L74

I think there are other examples of this behaviour in the provider, so there might be a helper to filter these resources, perhaps @anGie44 knows

Dec 17 '20 23:12 dblooman

Ahh yes that param is tricky given we're keeping track of secrets associated via the resource vs. those outside TF or via another resource.. I see your original rationale makes sense @dblooman -- let me propose a fix to include the logic you had added(it's a bit unconventional but we'll see if it's a maintainable alternative). Apologies that got left out in the original feature!

Dec 18 '20 05:12 anGie44

I understand you can't use a for_each expression with this resource because it will remain only the last iteration of the loop?

Mar 30 '21 19:03 ivanferb

I understand you can't use a for_each expression with this resource because it will remain only the last iteration of the loop?

Yes, and you also can't use this resource in multiple modules (e.g. maybe each of your microservices is a separate module and you want to handle MSK credentials independently, not centrally).

Mar 31 '21 11:03 priiiiit

Are there any updates on this?

May 23 '21 22:05 maxgr0

Is there any update on this?

I'm experiencing this issue and require the ability to assign secrets to msk per microservice deployment.

AWS Provider version: 3.55.0 Terraform Version: 0.15.3

We have a module that generates the secret and then will associate it with the MSK cluster. Once that's done we use the mongey kafka provider to set the ACLs for a new user associated with the MSK cluster.

Code from the module:

variables.tf

variable "secret_identifier" {
  type        = string
  description = ""
}

variable "username" {
  type        = string
  description = "Username to be set in the secret as the username to access MSK"
}

variable "auto_generate_password" {
  type        = bool
  description = "Toggle whether the password to be used with the specified username should be autogenerated"
  default     = false
}

variable "password" {
  type        = string
  description = "Password to be stored in the secret to be the password used to access MSK"
  default     = null
  sensitive   = true
}

variable "msk_cluster_arn" {
  type        = string
  description = "ARN of the msk cluster where the user will be attached to"
}

variable "kms_key_id" {
  type        = string
  description = "KMS Key ID used to encrypt the secret that holds the username and password. This is generally created in the msk module"
}

variable "kafka_acls" {
  type = list(object({
    resource_name_pattern = string
    resource_pattern_type = string
    resource_type         = string
    acl_operation         = string
    acl_permission_type   = string
  }))
  description = "List of configuration to set access control list for the specified user under creation. The kafka provider needed for this should be configured to use admin credentials"
  default     = []
  validation {
    condition = alltrue([
      for v in var.kafka_acls : contains(["Prefixed", "Any", "Match", "Literal"], v.resource_pattern_type)
    ])
    error_message = "Resource Pattern Type can only be one of the following values: Prefixed, Any, Match, Literal."
  }
  validation {
    condition = alltrue([
      for v in var.kafka_acls : contains(["Unknown", "Any", "Topic", "Group", "Cluster", "TransactionalID"], v.resource_type)
    ])
    error_message = "Resource Type can only be one of the following values: Unknown, Any, Topic, Group, Cluster, TransactionalID."
  }
  validation {
    condition = alltrue([
      for v in var.kafka_acls : contains(["Unknown", "Any", "All", "Read", "Write", "Create", "Delete", "Alter", "Describe", "ClusterAction", "DescribeConfigs", "AlterConfigs", "IdempotentWrite"], v.acl_operation)
    ])
    error_message = "ACL operation can only be one of the following values: Unknown, Any, All, Read, Write, Create, Delete, Alter, Describe, ClusterAction, DescribeConfigs, AlterConfigs, IdempotentWrite."
  }
  validation {
    condition = alltrue([
      for v in var.kafka_acls : contains(["Unknown", "Any", "Allow", "Deny"], v.acl_permission_type)
    ])
    error_message = "ACL Permission Type can only be one of the following values: Unknown, Any, Allow, Deny."
  }

}

main.tf

locals {
  password = var.auto_generate_password == true && var.password == null ? random_password.auto_generated_password[0].result : var.password
}

resource "aws_secretsmanager_secret" "this" {
  name                    = "AmazonMSK_${var.secret_identifier}"
  kms_key_id              = var.kms_key_id
  recovery_window_in_days = 0 //Force delete as we don't have permissions to restore secrets
}

resource "random_password" "auto_generated_password" {
  count            = var.auto_generate_password == true && var.password == null ? 1 : 0
  length           = 16
  special          = true
  upper            = true
  lower            = true
  override_special = "!@#$_-"
}

resource "aws_secretsmanager_secret_version" "this" {
  secret_id     = aws_secretsmanager_secret.this.id
  secret_string = jsonencode({ username = var.username, password = local.password })
}

resource "aws_secretsmanager_secret_policy" "this" {
  secret_arn = aws_secretsmanager_secret.this.arn
  policy     = data.aws_iam_policy_document.secret_policy.json
}

resource "aws_msk_scram_secret_association" "this" {
  depends_on = [
    aws_secretsmanager_secret_version.this,
    aws_secretsmanager_secret_policy.this
  ]
  cluster_arn     = var.msk_cluster_arn
  secret_arn_list = [aws_secretsmanager_secret.this.arn]

}

resource "kafka_acl" "this" {
  depends_on                   = [aws_msk_scram_secret_association.this]
  for_each                     = { for acl in var.kafka_acls : "${acl.resource_name_pattern}-${var.username}-${acl.acl_operation}" => acl }
  acl_principal                = "User:${var.username}"
  acl_host                     = "*"
  acl_operation                = each.value.acl_operation
  acl_permission_type          = each.value.acl_permission_type
  resource_name                = each.value.resource_name_pattern
  resource_type                = each.value.resource_type
  resource_pattern_type_filter = each.value.resource_pattern_type
}

Aug 24 '21 07:08 emmaroberts-nbs

Any updates on this? I have the same issue, but this is independent of whether I am using a for_each. I have already raised it. For now I have a work around by simply using a separate module for the secret associations.

May 27 '22 10:05 mattia-crypto

Hey Terraform team, any update and schedule for this problem? I'm having the same issue, and hoping to have at least an official guideline/workaround to smoothly and stably deploy the secret association without overriding the existing ones every time.

Oct 31 '22 12:10 sdhuang32

Facing this issue too, my workaround is to look up all the secrets and add them:

# HACK!
# Look up all the "AmazonMSK_[ENV]*" secrets and their arns so we can add them
# to the 'secret_arn_list' below.
data "aws_secretsmanager_secrets" "amazon_msk" {
  filter {
    name   = "name"
    values = ["AmazonMSK_${local.environment}"]
  }
}

resource "aws_msk_scram_secret_association" "scram" {
  cluster_arn     = var.msk_arn
  secret_arn_list = data.aws_secretsmanager_secrets.amazon_msk.arns
}

Dec 19 '22 15:12 mindovermiles262

We tried MSK user association resource with the below scenario: We provisioned msk with users, in case we want to associate the new user to msk, we tried appending the user in the existing list, but the user is not getting associated with the cluster. we tried the lifecycle hook and the data hack which is there in previous comments but it didn't work. Observed behavior: After appending the new user to the existing provisioned users list, the new user secret is created but it is not associated with the cluster.

@gdavison any thoughts/pointers?

Mar 09 '23 07:03 prajyot-narulkar925

Alright, I've been struggling trying to get this working with a SOA setup where our secrets are decentralized. Here's what I did that seems to be working for now. We've got a module that is responsible for generating user credentials as well as associating those secrets with the cluster and then generating ACLs for the newly generated user. A service imports this module into their project, specifies the topics they want access to and everything else is taken care of behind the scenes. Like everyone here, I was running into issues with the secrets association overwriting everything that currently existed. Tried a couple of the above hacks with middling success. But, what I did notice is that the CLI call to batch associate secrets works as you'd expect this resource to - it appends the list of existing secrets associated with the cluster instead of overwriting! Additionally, the boto3 client call does the same - just append! So, I removed the aws_msk_scram_secret_association entirely from my module. Instead, I have a lambda invoke resource that is triggered only on ARN change of the aws_secretsmanager_secret_version itself. The lambda code takes in the ARN of the secret as well as the ARN of the cluster, uses boto3 and makes the batch secret association call. Now I don't have to worry about secret associations being wiped and can deploy the module to many services. The only downside to this is that if I delete the module, the secret association is not removed automatically.

Mar 14 '23 00:03 kiley-poole

Can you share some more insight into the solution may be an example of the solution. so it solves most of the people's issues related to msk user association using terraform? @kiley-poole

Mar 14 '23 07:03 prajyot-narulkar925

Certainly.

resource "aws_lambda_invocation" "msk_credentials" {
  function_name = {name of lambda function}
  input = jsonencode({
    cluster_arn = {cluster arn value},
    secret_arn = aws_secretsmanager_secret_version.msk_cluster.arn
  })
  triggers = {
    redeployment = sha1(jsonencode([
      aws_secretsmanager_secret_version.msk_cluster.arn
    ]))
  }
  depends_on = [
    aws_secretsmanager_secret_version.msk_cluster,
    {cluster arn value}
  ]
}

Lambda code

def associate_secret(args, _):
    client = boto3.client('kafka')
    try:
        client.batch_associate_scram_secret(
            ClusterArn=args['cluster_arn'],
            SecretArnList=[
                args['secret_arn']
            ]
        )
    except ClientError as exc:
        print(exc)

Mar 14 '23 15:03 kiley-poole

Hi. Is there any update on this? This does effectively prevent the use of SCRAM associations in a non-centralized manner. Thanks

Sep 11 '23 20:09 joaocc

We've opted for a shell-based solution, but it is really the same as @kiley-poole's lambda idea.

So we have some code to trigger the script:

resource "terraform_data" "secret_association" {
  triggers_replace = [
    aws_secretsmanager_secret.new_identity.id,
    data.aws_msk_cluster.msk_cluster.arn
  ]

  provisioner "local-exec" {
    working_dir = "${path.module}/src"
    interpreter = ["/bin/bash"]
    command     = "associate-secret.sh"
    environment = {
      MSK_CLUSTER_ARN     = data.aws_msk_cluster.msk_cluster.arn
      IDENTITY_SECRET_ARN = aws_secretsmanager_secret.new_identity.arn
    }
  }
}

locals {
  # Try and ensure this variable is not set until terraform_data.secret_association has executed so outputs are not prematurely available
  ensure_secret_associated = terraform_data.secret_association.output == "FORCE_DEPENDENCY_ON_SECRET_BEING_ASSOCIATED" ? true : true
}

output "identity_secret_arn" {
  description = "ARN of Secret containing Identity credentials"
  value       = local.ensure_secret_associated ? aws_secretsmanager_secret.new_identity.arn : null
}

That last bit fixed an issue where Terraform eagerly returns output values, so the caller was trying to use the Secret to make Kafka calls before the association had been made. This just forced a dependency on the output of that step. Happy for any better suggestions

The shell script itself looks like:

#!/bin/bash

: "${MSK_CLUSTER_ARN:?}"
: "${IDENTITY_SECRET_ARN:?}"

_fn_run_task() {
  RESULT=$(aws kafka batch-associate-scram-secret --cluster-arn "${MSK_CLUSTER_ARN}" --secret-arn-list "${IDENTITY_SECRET_ARN}")
  ERROR=$?
  echo "${RESULT}"
  if { [[ $ERROR -ne 0 ]] || [[ "${RESULT}" =~ "ErrorMessage" ]]; } && [[ ! "${RESULT}" =~ "secret is already associated" ]]; then
    return 1
  else
    if [[ "${RESULT}" =~ "secret is already associated" ]]; then
      echo -e "\033[0;36mIgnoring error about already associated Secret\033[0m"
    fi
    return 0
  fi
}

# Sleep upfront to hopefully reduce need for retry at all and have a less confusing log
sleep 10

MAX_ATTEMPTS=2
RETRY_COUNT=0
while ! _fn_run_task; do
  RETRY_COUNT=$((RETRY_COUNT + 1))
  if [[ "${RETRY_COUNT}" -lt "${MAX_ATTEMPTS}" ]]; then
    echo -e "\033[0;36mRetrying Secret association (attempt $((RETRY_COUNT + 1)) of ${MAX_ATTEMPTS})...\033[0m"
    sleep 5
  else
    break
  fi
done

if [[ ${RETRY_COUNT} -ge "${MAX_ATTEMPTS}" ]]; then
  echo -e "\033[1;31mSecret association failed after ${RETRY_COUNT} attempts\033[0m"
  exit 1
fi

I added the retry handling as sometimes there was a slight delay before the Secret was actually available for association, and in another situation we create a Secret immediately after creating the Cluster itself, and just because Terraform returns after that operation there is a definite delay before it is usable. We also don't fail if the Secret is already associated since we can't wholly rely on the Terraform State to know what it has done here. I was pleased to note however that if a Secret is deleted, the association is cleaned up automatically.

This does assume you can run the AWS CLI directly so it needs to be in your Docker image for example

Nov 30 '23 08:11 patrickherrera

Hello, I would like to contribute a PR that fixes this issue. But i am not sure if I should change the existing aws_msk_scram_secret_association resource or create a new aws_msk_single_scram_secret_association resource that looks like this:

resource "aws_msk_single_scram_secret_association" "some-secret-association" {
  cluster_arn = data.aws_msk_cluster.some-cluster.arn
  secret_arn  = aws_secretsmanager_secret.some-secret.arn
}

Changing the existing resource would introduce a breaking change, unless we make the secret_arn_list field optional and introduce a second secret_arn field. One of the two fields need to be set for validation to succeed.

If we are going with the new resource, feel free to suggest a suitable name.

What would be the best approach going forward?

Feb 14 '24 09:02 jandillenkofer

Thank you @jandillenkofer. I vote going with the same resource and secret_arn_list. While theoretically it is indeed a breaking change, I do not see it being a problem in any actual usage scenario (unless someone is using some real nasty workarounds here maybe).

Feb 15 '24 09:02 priiiiit

Thanks @jandillenkofer!

while having the secret_arn_list would be nice, I think also having a resource where you can do a single association would be amazing as well, kinda like how there are single ones for aws_vpc_security_group_ingress_rule and the like.

Feb 21 '24 19:02 jwhitcraft

What would be the difference between having aws_msk_single_scram_secret_association vs using the existing resource and putting only one arn in secret_arn_list though?

Feb 23 '24 07:02 priiiiit

The difference between aws_msk_single_scram_secret_association and aws_msk_scram_secret_association with just one entry in the secret_arn_list would be, that removing the aws_msk_single_scram_secret_association would result in just this one connection between the secret and the cluster being removed.

When removing the aws_msk_scram_secret_association from the cluster all associations on this cluster are lost.

I created this fork with an example implementation of aws_msk_single_scram_secret_association. But i am not sure, if we can just add a new resource, because the interaction between the aws_msk_single_scram_secret_association and aws_msk_scram_secret_association is kinda weird.

Changing the existing resource is complicated as well, because we need to migrate the old state of the existing resource to a new state with multiple entries with just one single secret_arn. And it would definitely be a breaking change

Feb 23 '24 07:02 jandillenkofer

In our use case, we would also favour aws_msk_single_scram_secret_association as it allows different parts of the code to add/remove associations independently, while the list approach requires all changes to me made by the same part of the code. As mentioned above, this is also similar to what is done in other similar areas of AWS terraform world (VPC for instance).

Feb 23 '24 09:02 joaocc

From @jandillenkofer's initial explanation I understood that it would be possible to keep only one resource, but still maintain each scram association separately. They initially suggested it can be done via the same resource, but changing secret_arn_list to be optional and adding a secret_arn field to manage single associations.

Wouldn't it be possible to only keep the current resource and only secret_arn_list field, but adjust the behaviour so that it only associates and deassociate secrets that were added with that specific resource? This way we could still manage each secret separately if we want and can get rid of the current awful behaviour, where one resource can override another resource's associations (which goes against Terraform's best practices tbh).

This is what I have in mind:

# Associating just a single secret:
resource "aws_msk_scram_secret_association" "single" {
  cluster_arn     = aws_msk_cluster.example.arn
  secret_arn_list = [aws_secretsmanager_secret.single.arn]

  depends_on = [aws_secretsmanager_secret_version.single]
}

# Associating a bunch of secrets:
resource "aws_msk_scram_secret_association" "bulk" {
  cluster_arn     = aws_msk_cluster.example.arn
  secret_arn_list = [
                      aws_secretsmanager_secret.bulk[0].arn,
                      aws_secretsmanager_secret.bulk[1].arn,
                      aws_secretsmanager_secret.bulk[2].arn
                    ]

  depends_on = [aws_secretsmanager_secret_version.bulk]
}

After applying this, there would be a total of 4 associations made. aws_msk_scram_secret_association.single does not override aws_msk_scram_secret_association.bulk or vice versa. This would be a breaking change, because it fundamentally changes the way this resource currently works. But arguably, the current version is heavily flawed also.

PS. Not sure if important, but no other resource name in AWS provider has single in it, even though the mechanism is somewhat similar (e.g. aws_vpc_security_group_ingress_rule).

Feb 23 '24 13:02 priiiiit

@ppihus complementing my prev explanation, I'm sure it would still be useful to keep the current behaviour for the use cases where it is needed to control the full set (to remove all, overwrite, or make sure the content is exactly that). I think that to be able to have both behaviours (full and independent) we would need to have the separate resources or, at least, add a flag/field that modifies the current resource (eg: mode: incremental/full, or other relevant name)

Feb 23 '24 13:02 joaocc

I understand, but this violates the very key principles of idempotency and makes a great risk for configuration drift because of the "last writer wins" pattern.

Are there any other resources in the AWS provider that allow this kind of behaviour?

Feb 23 '24 13:02 priiiiit

so @ppihus i've been thinking about this. What about keeping the resource name the same but adding a new attribute that is just secret_arn to use when you want to associate a single secrets vs using the list option

So like this:

resource "aws_msk_scram_secret_association" "single" {
  cluster_arn     = aws_msk_cluster.example.arn
  secret_arn      = aws_secretsmanager_secret.single.arn

  depends_on = [aws_secretsmanager_secret_version.single]
}

Mar 07 '24 13:03 jwhitcraft

Can you elaborate why would this be better than just having one item in the secret_arn_list?

Also a question about a theoretical secnario. What if you initially have:

resource "aws_msk_scram_secret_association" "example" {
  cluster_arn     = aws_msk_cluster.example.arn
  secret_arn      = aws_secretsmanager_secret.single.arn

  depends_on = [aws_secretsmanager_secret_version.single]
}

but then change this resource to:

resource "aws_msk_scram_secret_association" "example" {
  cluster_arn     = aws_msk_cluster.example.arn
  secret_arn_list = [aws_secretsmanager_secret.single.arn]

  depends_on = [aws_secretsmanager_secret_version.single]
}

Would tf recreate this association even though the end result is exactly the same?

Mar 07 '24 14:03 priiiiit

@ppihus I see your point there. I honestly think that the problem with with how it's stored in the state right now, cause if you only link in one item, the state has everything else that is already associated, thus breaking everything on the next run.

Maybe that's the real fix here, is making sure that the state only cares about what was passed in and not everything else that comes back from the aws kafka batch-associate-scram-secret call.

Mar 07 '24 15:03 jwhitcraft

I just found the following comment from @bflad in a PR from 2020.

... If there really is a strong use case for 1:1 management like other association resources, it may be best to consider something along the lines of the following to treat that use case separately in the logic:

Mark secret_arn_list as Optional + Computed

Add secret_arn as new TypeString + Optional + Computed + ConflictsWith attribute

In the import function, support an optional delimited import ID that triggers secondary behavior of d.Set("secret_arn", ...) and d.SetId() with the cluster ARN

In the read function, perform d.Get("secret_arn") check against the list of secret ARNs to trigger resource recreation

Would this be a fitting solution to our problem?

Mar 08 '24 07:03 jandillenkofer

terraform-provider-aws terraform-provider-aws copied to clipboard

aws_msk_scram_secret_association overrides/removes existing secret associations

Community Note

Terraform CLI and Terraform AWS Provider Version

Affected Resource(s)

Terraform Configuration Files

Expected Behavior

Actual Behavior

Steps to Reproduce

References

variables.tf

main.tf

terraform-provider-aws
terraform-provider-aws copied to clipboard