terraform-provider-aws
terraform-provider-aws copied to clipboard
aws_msk_scram_secret_association overrides/removes existing secret associations
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Terraform CLI and Terraform AWS Provider Version
> terraform -v
Terraform v0.13.5
+ provider registry.terraform.io/hashicorp/aws v3.21.0
Affected Resource(s)
- aws_msk_scram_secret_association
Terraform Configuration Files
Please include all Terraform configurations required to reproduce the bug. Bug reports without a functional reproduction may be closed without investigation.
resource "aws_msk_scram_secret_association" "attach_to_cluster" {
cluster_arn = 'my-cluster'
secret_arn_list = [arn:aws:secretsmanager:us-west-1:XXX:secret:AmazonMSK_Tf-Module_Test_20201216135753890700000001-ybcWM3]
}
Expected Behavior
Existing resources that are not defined in this terraform stack (or in another stack) should not be removed. This is needed to avoid a centralized management of secrets associated to a MSK cluster.
# aws_msk_scram_secret_association.attach_to_cluster will be updated in-place
~ resource "aws_msk_scram_secret_association" "attach_to_cluster" {
cluster_arn = "arn:aws:kafka:us-west-1:XXX:cluster/my-custer/9999-9999-9999-9"
id = "arn:aws:kafka:us-west-1:XXX:cluster/my-custer/9999-9999-9999-9"
~ secret_arn_list = [
+ "arn:aws:secretsmanager:us-west-1:XXX:secret:AmazonMSK_Tf-Module_Test_20201216135753890700000001-ybcWM3",
]
}
Actual Behavior
Existing secrets are removed from the MSK Cluster in the plan:
# aws_msk_scram_secret_association.attach_to_cluster will be updated in-place
~ resource "aws_msk_scram_secret_association" "attach_to_cluster" {
cluster_arn = "arn:aws:kafka:us-west-1:XXX:cluster/my-custer/9999-9999-9999-9"
id = "arn:aws:kafka:us-west-1:XXX:cluster/my-custer/9999-9999-9999-9"
~ secret_arn_list = [
- "arn:aws:secretsmanager:us-west-1:XXX:secret:AmazonMSK_ST_Akhq_20201013111302148600000001-ATFaph",
- "arn:aws:secretsmanager:us-west-1:XXX:secret:AmazonMSK_ST_Schema-Registry_20201013111302148900000002-dTOIjg",
"arn:aws:secretsmanager:us-west-1:XXX:secret:AmazonMSK_Tf-Module_Test_20201216135753890700000001-ybcWM3",
]
}
Looks like the provider is trying to remove all the secrets that are not defined in this aws_msk_scram_secret_association
.
Steps to Reproduce
- Create a MSK cluster with some secrets associated on it in a first terraform stack
- Create another terraform stack that associate new secrets to new previously created MSK cluster
-
terraform apply
-
terraform plan
References
Pull request:
- #15302
I experienced this when I submitted my original PR, it is because the API will always return all the associated secrets, regardless of how they were associated. With most terraform providers, you would normally read the remote resource and store all the resources returned, but this makes a huge assumption of how you manage resources, i.e all through terraform. Here is the code - https://github.com/hashicorp/terraform-provider-aws/pull/15302/files#diff-2836c697ab96fb1c60919893bfb0dfd0405fb9138d7d9640375ea3fc8686ae81R69
The way that I got around this was to filter the secrets based on the inputs and the response for the API in my original PR - https://github.com/dblooman/terraform-provider-aws/blob/58e10f98553d811047dd6f33b1c1d01359814a46/aws/resource_aws_msk_scram_secret.go#L74
I think there are other examples of this behaviour in the provider, so there might be a helper to filter these resources, perhaps @anGie44 knows
Ahh yes that param is tricky given we're keeping track of secrets associated via the resource vs. those outside TF or via another resource.. I see your original rationale makes sense @dblooman -- let me propose a fix to include the logic you had added(it's a bit unconventional but we'll see if it's a maintainable alternative). Apologies that got left out in the original feature!
I understand you can't use a for_each expression with this resource because it will remain only the last iteration of the loop?
I understand you can't use a for_each expression with this resource because it will remain only the last iteration of the loop?
Yes, and you also can't use this resource in multiple modules (e.g. maybe each of your microservices is a separate module and you want to handle MSK credentials independently, not centrally).
Are there any updates on this?
Is there any update on this?
I'm experiencing this issue and require the ability to assign secrets to msk per microservice deployment.
AWS Provider version: 3.55.0 Terraform Version: 0.15.3
We have a module that generates the secret and then will associate it with the MSK cluster. Once that's done we use the mongey kafka provider to set the ACLs for a new user associated with the MSK cluster.
Code from the module:
variables.tf
variable "secret_identifier" {
type = string
description = ""
}
variable "username" {
type = string
description = "Username to be set in the secret as the username to access MSK"
}
variable "auto_generate_password" {
type = bool
description = "Toggle whether the password to be used with the specified username should be autogenerated"
default = false
}
variable "password" {
type = string
description = "Password to be stored in the secret to be the password used to access MSK"
default = null
sensitive = true
}
variable "msk_cluster_arn" {
type = string
description = "ARN of the msk cluster where the user will be attached to"
}
variable "kms_key_id" {
type = string
description = "KMS Key ID used to encrypt the secret that holds the username and password. This is generally created in the msk module"
}
variable "kafka_acls" {
type = list(object({
resource_name_pattern = string
resource_pattern_type = string
resource_type = string
acl_operation = string
acl_permission_type = string
}))
description = "List of configuration to set access control list for the specified user under creation. The kafka provider needed for this should be configured to use admin credentials"
default = []
validation {
condition = alltrue([
for v in var.kafka_acls : contains(["Prefixed", "Any", "Match", "Literal"], v.resource_pattern_type)
])
error_message = "Resource Pattern Type can only be one of the following values: Prefixed, Any, Match, Literal."
}
validation {
condition = alltrue([
for v in var.kafka_acls : contains(["Unknown", "Any", "Topic", "Group", "Cluster", "TransactionalID"], v.resource_type)
])
error_message = "Resource Type can only be one of the following values: Unknown, Any, Topic, Group, Cluster, TransactionalID."
}
validation {
condition = alltrue([
for v in var.kafka_acls : contains(["Unknown", "Any", "All", "Read", "Write", "Create", "Delete", "Alter", "Describe", "ClusterAction", "DescribeConfigs", "AlterConfigs", "IdempotentWrite"], v.acl_operation)
])
error_message = "ACL operation can only be one of the following values: Unknown, Any, All, Read, Write, Create, Delete, Alter, Describe, ClusterAction, DescribeConfigs, AlterConfigs, IdempotentWrite."
}
validation {
condition = alltrue([
for v in var.kafka_acls : contains(["Unknown", "Any", "Allow", "Deny"], v.acl_permission_type)
])
error_message = "ACL Permission Type can only be one of the following values: Unknown, Any, Allow, Deny."
}
}
main.tf
locals {
password = var.auto_generate_password == true && var.password == null ? random_password.auto_generated_password[0].result : var.password
}
resource "aws_secretsmanager_secret" "this" {
name = "AmazonMSK_${var.secret_identifier}"
kms_key_id = var.kms_key_id
recovery_window_in_days = 0 //Force delete as we don't have permissions to restore secrets
}
resource "random_password" "auto_generated_password" {
count = var.auto_generate_password == true && var.password == null ? 1 : 0
length = 16
special = true
upper = true
lower = true
override_special = "!@#$_-"
}
resource "aws_secretsmanager_secret_version" "this" {
secret_id = aws_secretsmanager_secret.this.id
secret_string = jsonencode({ username = var.username, password = local.password })
}
resource "aws_secretsmanager_secret_policy" "this" {
secret_arn = aws_secretsmanager_secret.this.arn
policy = data.aws_iam_policy_document.secret_policy.json
}
resource "aws_msk_scram_secret_association" "this" {
depends_on = [
aws_secretsmanager_secret_version.this,
aws_secretsmanager_secret_policy.this
]
cluster_arn = var.msk_cluster_arn
secret_arn_list = [aws_secretsmanager_secret.this.arn]
}
resource "kafka_acl" "this" {
depends_on = [aws_msk_scram_secret_association.this]
for_each = { for acl in var.kafka_acls : "${acl.resource_name_pattern}-${var.username}-${acl.acl_operation}" => acl }
acl_principal = "User:${var.username}"
acl_host = "*"
acl_operation = each.value.acl_operation
acl_permission_type = each.value.acl_permission_type
resource_name = each.value.resource_name_pattern
resource_type = each.value.resource_type
resource_pattern_type_filter = each.value.resource_pattern_type
}
Any updates on this? I have the same issue, but this is independent of whether I am using a for_each
. I have already raised it. For now I have a work around by simply using a separate module for the secret associations.
Hey Terraform team, any update and schedule for this problem? I'm having the same issue, and hoping to have at least an official guideline/workaround to smoothly and stably deploy the secret association without overriding the existing ones every time.
Facing this issue too, my workaround is to look up all the secrets and add them:
# HACK!
# Look up all the "AmazonMSK_[ENV]*" secrets and their arns so we can add them
# to the 'secret_arn_list' below.
data "aws_secretsmanager_secrets" "amazon_msk" {
filter {
name = "name"
values = ["AmazonMSK_${local.environment}"]
}
}
resource "aws_msk_scram_secret_association" "scram" {
cluster_arn = var.msk_arn
secret_arn_list = data.aws_secretsmanager_secrets.amazon_msk.arns
}
We tried MSK user association resource with the below scenario: We provisioned msk with users, in case we want to associate the new user to msk, we tried appending the user in the existing list, but the user is not getting associated with the cluster. we tried the lifecycle hook and the data hack which is there in previous comments but it didn't work. Observed behavior: After appending the new user to the existing provisioned users list, the new user secret is created but it is not associated with the cluster.
@gdavison any thoughts/pointers?
Alright, I've been struggling trying to get this working with a SOA setup where our secrets are decentralized. Here's what I did that seems to be working for now. We've got a module that is responsible for generating user credentials as well as associating those secrets with the cluster and then generating ACLs for the newly generated user. A service imports this module into their project, specifies the topics they want access to and everything else is taken care of behind the scenes. Like everyone here, I was running into issues with the secrets association overwriting everything that currently existed. Tried a couple of the above hacks with middling success.
But, what I did notice is that the CLI call to batch associate secrets works as you'd expect this resource to - it appends the list of existing secrets associated with the cluster instead of overwriting! Additionally, the boto3 client call does the same - just append! So, I removed the aws_msk_scram_secret_association
entirely from my module. Instead, I have a lambda invoke resource that is triggered only on ARN change of the aws_secretsmanager_secret_version
itself. The lambda code takes in the ARN of the secret as well as the ARN of the cluster, uses boto3 and makes the batch secret association call.
Now I don't have to worry about secret associations being wiped and can deploy the module to many services. The only downside to this is that if I delete the module, the secret association is not removed automatically.
Can you share some more insight into the solution may be an example of the solution. so it solves most of the people's issues related to msk user association using terraform? @kiley-poole
Certainly.
resource "aws_lambda_invocation" "msk_credentials" {
function_name = {name of lambda function}
input = jsonencode({
cluster_arn = {cluster arn value},
secret_arn = aws_secretsmanager_secret_version.msk_cluster.arn
})
triggers = {
redeployment = sha1(jsonencode([
aws_secretsmanager_secret_version.msk_cluster.arn
]))
}
depends_on = [
aws_secretsmanager_secret_version.msk_cluster,
{cluster arn value}
]
}
Lambda code
def associate_secret(args, _):
client = boto3.client('kafka')
try:
client.batch_associate_scram_secret(
ClusterArn=args['cluster_arn'],
SecretArnList=[
args['secret_arn']
]
)
except ClientError as exc:
print(exc)
Hi. Is there any update on this? This does effectively prevent the use of SCRAM associations in a non-centralized manner. Thanks
We've opted for a shell-based solution, but it is really the same as @kiley-poole's lambda idea.
So we have some code to trigger the script:
resource "terraform_data" "secret_association" {
triggers_replace = [
aws_secretsmanager_secret.new_identity.id,
data.aws_msk_cluster.msk_cluster.arn
]
provisioner "local-exec" {
working_dir = "${path.module}/src"
interpreter = ["/bin/bash"]
command = "associate-secret.sh"
environment = {
MSK_CLUSTER_ARN = data.aws_msk_cluster.msk_cluster.arn
IDENTITY_SECRET_ARN = aws_secretsmanager_secret.new_identity.arn
}
}
}
locals {
# Try and ensure this variable is not set until terraform_data.secret_association has executed so outputs are not prematurely available
ensure_secret_associated = terraform_data.secret_association.output == "FORCE_DEPENDENCY_ON_SECRET_BEING_ASSOCIATED" ? true : true
}
output "identity_secret_arn" {
description = "ARN of Secret containing Identity credentials"
value = local.ensure_secret_associated ? aws_secretsmanager_secret.new_identity.arn : null
}
That last bit fixed an issue where Terraform eagerly returns output
values, so the caller was trying to use the Secret to make Kafka calls before the association had been made. This just forced a dependency on the output of that step. Happy for any better suggestions
The shell script itself looks like:
#!/bin/bash
: "${MSK_CLUSTER_ARN:?}"
: "${IDENTITY_SECRET_ARN:?}"
_fn_run_task() {
RESULT=$(aws kafka batch-associate-scram-secret --cluster-arn "${MSK_CLUSTER_ARN}" --secret-arn-list "${IDENTITY_SECRET_ARN}")
ERROR=$?
echo "${RESULT}"
if { [[ $ERROR -ne 0 ]] || [[ "${RESULT}" =~ "ErrorMessage" ]]; } && [[ ! "${RESULT}" =~ "secret is already associated" ]]; then
return 1
else
if [[ "${RESULT}" =~ "secret is already associated" ]]; then
echo -e "\033[0;36mIgnoring error about already associated Secret\033[0m"
fi
return 0
fi
}
# Sleep upfront to hopefully reduce need for retry at all and have a less confusing log
sleep 10
MAX_ATTEMPTS=2
RETRY_COUNT=0
while ! _fn_run_task; do
RETRY_COUNT=$((RETRY_COUNT + 1))
if [[ "${RETRY_COUNT}" -lt "${MAX_ATTEMPTS}" ]]; then
echo -e "\033[0;36mRetrying Secret association (attempt $((RETRY_COUNT + 1)) of ${MAX_ATTEMPTS})...\033[0m"
sleep 5
else
break
fi
done
if [[ ${RETRY_COUNT} -ge "${MAX_ATTEMPTS}" ]]; then
echo -e "\033[1;31mSecret association failed after ${RETRY_COUNT} attempts\033[0m"
exit 1
fi
I added the retry handling as sometimes there was a slight delay before the Secret was actually available for association, and in another situation we create a Secret immediately after creating the Cluster itself, and just because Terraform returns after that operation there is a definite delay before it is usable. We also don't fail if the Secret is already associated since we can't wholly rely on the Terraform State to know what it has done here. I was pleased to note however that if a Secret is deleted, the association is cleaned up automatically.
This does assume you can run the AWS CLI directly so it needs to be in your Docker image for example
Hello, I would like to contribute a PR that fixes this issue. But i am not sure if I should change the existing aws_msk_scram_secret_association resource or create a new aws_msk_single_scram_secret_association resource that looks like this:
resource "aws_msk_single_scram_secret_association" "some-secret-association" {
cluster_arn = data.aws_msk_cluster.some-cluster.arn
secret_arn = aws_secretsmanager_secret.some-secret.arn
}
Changing the existing resource would introduce a breaking change, unless we make the secret_arn_list field optional and introduce a second secret_arn field. One of the two fields need to be set for validation to succeed.
If we are going with the new resource, feel free to suggest a suitable name.
What would be the best approach going forward?
Thank you @jandillenkofer.
I vote going with the same resource and secret_arn_list
. While theoretically it is indeed a breaking change, I do not see it being a problem in any actual usage scenario (unless someone is using some real nasty workarounds here maybe).
Thanks @jandillenkofer!
while having the secret_arn_list
would be nice, I think also having a resource where you can do a single association would be amazing as well, kinda like how there are single ones for aws_vpc_security_group_ingress_rule
and the like.
What would be the difference between having aws_msk_single_scram_secret_association
vs using the existing resource and putting only one arn in secret_arn_list
though?
The difference between aws_msk_single_scram_secret_association
and aws_msk_scram_secret_association
with just one entry in the secret_arn_list
would be, that removing the aws_msk_single_scram_secret_association
would result in just this one connection between the secret and the cluster being removed.
When removing the aws_msk_scram_secret_association
from the cluster all associations on this cluster are lost.
I created this fork with an example implementation of aws_msk_single_scram_secret_association
.
But i am not sure, if we can just add a new resource, because the interaction between the aws_msk_single_scram_secret_association
and aws_msk_scram_secret_association
is kinda weird.
Changing the existing resource is complicated as well, because we need to migrate the old state of the existing resource to a new state with multiple entries with just one single secret_arn. And it would definitely be a breaking change
In our use case, we would also favour aws_msk_single_scram_secret_association
as it allows different parts of the code to add/remove associations independently, while the list approach requires all changes to me made by the same part of the code. As mentioned above, this is also similar to what is done in other similar areas of AWS terraform world (VPC for instance).
From @jandillenkofer's initial explanation I understood that it would be possible to keep only one resource, but still maintain each scram association separately.
They initially suggested it can be done via the same resource, but changing secret_arn_list
to be optional and adding a secret_arn
field to manage single associations.
Wouldn't it be possible to only keep the current resource and only secret_arn_list
field, but adjust the behaviour so that it only associates and deassociate secrets that were added with that specific resource? This way we could still manage each secret separately if we want and can get rid of the current awful behaviour, where one resource can override another resource's associations (which goes against Terraform's best practices tbh).
This is what I have in mind:
# Associating just a single secret:
resource "aws_msk_scram_secret_association" "single" {
cluster_arn = aws_msk_cluster.example.arn
secret_arn_list = [aws_secretsmanager_secret.single.arn]
depends_on = [aws_secretsmanager_secret_version.single]
}
# Associating a bunch of secrets:
resource "aws_msk_scram_secret_association" "bulk" {
cluster_arn = aws_msk_cluster.example.arn
secret_arn_list = [
aws_secretsmanager_secret.bulk[0].arn,
aws_secretsmanager_secret.bulk[1].arn,
aws_secretsmanager_secret.bulk[2].arn
]
depends_on = [aws_secretsmanager_secret_version.bulk]
}
After applying this, there would be a total of 4 associations made. aws_msk_scram_secret_association.single
does not override aws_msk_scram_secret_association.bulk
or vice versa.
This would be a breaking change, because it fundamentally changes the way this resource currently works. But arguably, the current version is heavily flawed also.
PS. Not sure if important, but no other resource name in AWS provider has single
in it, even though the mechanism is somewhat similar (e.g. aws_vpc_security_group_ingress_rule
).
@ppihus complementing my prev explanation, I'm sure it would still be useful to keep the current behaviour for the use cases where it is needed to control the full set (to remove all, overwrite, or make sure the content is exactly that). I think that to be able to have both behaviours (full and independent) we would need to have the separate resources or, at least, add a flag/field that modifies the current resource (eg: mode: incremental/full, or other relevant name)
I understand, but this violates the very key principles of idempotency and makes a great risk for configuration drift because of the "last writer wins" pattern.
Are there any other resources in the AWS provider that allow this kind of behaviour?
so @ppihus i've been thinking about this. What about keeping the resource name the same but adding a new attribute that is just secret_arn
to use when you want to associate a single secrets vs using the list option
So like this:
resource "aws_msk_scram_secret_association" "single" {
cluster_arn = aws_msk_cluster.example.arn
secret_arn = aws_secretsmanager_secret.single.arn
depends_on = [aws_secretsmanager_secret_version.single]
}
Can you elaborate why would this be better than just having one item in the secret_arn_list
?
Also a question about a theoretical secnario. What if you initially have:
resource "aws_msk_scram_secret_association" "example" {
cluster_arn = aws_msk_cluster.example.arn
secret_arn = aws_secretsmanager_secret.single.arn
depends_on = [aws_secretsmanager_secret_version.single]
}
but then change this resource to:
resource "aws_msk_scram_secret_association" "example" {
cluster_arn = aws_msk_cluster.example.arn
secret_arn_list = [aws_secretsmanager_secret.single.arn]
depends_on = [aws_secretsmanager_secret_version.single]
}
Would tf recreate this association even though the end result is exactly the same?
@ppihus I see your point there. I honestly think that the problem with with how it's stored in the state right now, cause if you only link in one item, the state has everything else that is already associated, thus breaking everything on the next run.
Maybe that's the real fix here, is making sure that the state only cares about what was passed in and not everything else that comes back from the aws kafka batch-associate-scram-secret
call.
I just found the following comment from @bflad in a PR from 2020.
... If there really is a strong use case for 1:1 management like other association resources, it may be best to consider something along the lines of the following to treat that use case separately in the logic:
- Mark secret_arn_list as Optional + Computed
- Add secret_arn as new TypeString + Optional + Computed + ConflictsWith attribute
- In the import function, support an optional delimited import ID that triggers secondary behavior of d.Set("secret_arn", ...) and d.SetId() with the cluster ARN
- In the read function, perform d.Get("secret_arn") check against the list of secret ARNs to trigger resource recreation
Would this be a fitting solution to our problem?