terraform-provider-kafka
terraform-provider-kafka copied to clipboard
Topic deleted from state store when ACL on topic is deleted
It seems that when both a topic and a related ACL are defined in Terraform, if you subsequently delete the terraform ACL definition, this will force the deletion of the topic from the state store, and then attempt to create the same topic which already exists. Example scenario below:
~/kafka_acl_issue $ ls -l
total 20
-rw-r--r-- 1 mat mat 233 May 28 21:07 acl.tf
-rw-r--r-- 1 mat mat 109 May 28 21:12 providers.tf
-rw-r--r-- 1 mat mat 156 May 28 21:39 terraform.tfstate
-rw-r--r-- 1 mat mat 564 May 28 21:39 terraform.tfstate.backup
-rw-r--r-- 1 mat mat 126 May 28 21:06 topic.tf
~/kafka_acl_issue $ cat topic.tf
resource "kafka_topic" "test_topic" {
name = "test.topic"
replication_factor = 1
partitions = 1
}
~/kafka_acl_issue $ cat acl.tf
resource "kafka_acl" "test_acl" {
resource_name = "test.topic"
resource_type = "Topic"
acl_principal = "User:Alice"
acl_host = "*"
acl_operation = "Write"
acl_permission_type = "Deny"
}
~/kafka_acl_issue $ terraform apply
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
+ create
Terraform will perform the following actions:
# kafka_acl.test_acl will be created
+ resource "kafka_acl" "test_acl" {
+ acl_host = "*"
+ acl_operation = "Write"
+ acl_permission_type = "Deny"
+ acl_principal = "User:Alice"
+ id = (known after apply)
+ resource_name = "test.topic"
+ resource_pattern_type_filter = "Literal"
+ resource_type = "Topic"
}
# kafka_topic.test_topic will be created
+ resource "kafka_topic" "test_topic" {
+ id = (known after apply)
+ name = "test.topic"
+ partitions = 1
+ replication_factor = 1
}
Plan: 2 to add, 0 to change, 0 to destroy.
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
kafka_topic.test_topic: Creating...
kafka_acl.test_acl: Creating...
kafka_topic.test_topic: Creation complete after 0s [id=test.topic]
kafka_acl.test_acl: Creation complete after 0s [id=User:Alice|*|Write|Deny|Topic|test.topic|Literal]
Apply complete! Resources: 2 added, 0 changed, 0 destroyed.
~/kafka_acl_issue $ mv acl.tf acl.tf.disabled
~/kafka_acl_issue $ terraform apply
kafka_acl.test_acl: Refreshing state... [id=User:Alice|*|Write|Deny|Topic|test.topic|Literal]
kafka_topic.test_topic: Refreshing state... [id=test.topic]
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
+ create
- destroy
Terraform will perform the following actions:
# kafka_acl.test_acl will be destroyed
- resource "kafka_acl" "test_acl" {
- acl_host = "*" -> null
- acl_operation = "Write" -> null
- acl_permission_type = "Deny" -> null
- acl_principal = "User:Alice" -> null
- id = "User:Alice|*|Write|Deny|Topic|test.topic|Literal" -> null
- resource_name = "test.topic" -> null
- resource_pattern_type_filter = "Literal" -> null
- resource_type = "Topic" -> null
}
# kafka_topic.test_topic will be created
+ resource "kafka_topic" "test_topic" {
+ id = (known after apply)
+ name = "test.topic"
+ partitions = 1
+ replication_factor = 1
}
Plan: 1 to add, 0 to change, 1 to destroy.
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
kafka_acl.test_acl: Destroying... [id=User:Alice|*|Write|Deny|Topic|test.topic|Literal]
kafka_topic.test_topic: Creating...
kafka_acl.test_acl: Destruction complete after 0s
Error: kafka server: Topic with this name already exists.
on topic.tf line 1, in resource "kafka_topic" "test_topic":
1: resource "kafka_topic" "test_topic" {
~/kafka_acl_issue $ cat terraform.tfstate
{
"version": 4,
"terraform_version": "0.12.0",
"serial": 11,
"lineage": "9fa0063f-476c-8bfb-0002-9ebc7478dd46",
"outputs": {},
"resources": []
}
~/kafka_acl_issue $
🤔 I'm having a pretty hard time tracking down why this happens. It seems like once we create the ACL for the topic, kafka no longer responds with the topic in it's list of topics, and as such, it appears to have been deleted.
Oh. I think I see what could be going on. It might be related to the broker setting allow.everyone.if.no.acl.found
As soon as we place an ACL on a topic, only superusers and those users granted access via an ACL can see it. So we need to ensure that whichever user is running the terraform job can see the topic after an ACL is applied to it.
Just wanted to note that I was having the same issue with allow.everyone.if.no.acl.found set to true. From what I can tell, Terraform basically "locks itself out" after attaching the ACL to the topic, and causing the provider to get into a weird state (for me this included deleting the topic, and then later not recreating it after incorrectly concluding that it already existed).
After starting by defining an ACL allowing the user I'm running the Terraform script with access to topics I seem to be getting around this problem. Making this user a superuser is likely also a solution, but this option is unfortunately not available to me as I'm using AWS MSK.
This is what worked for me (in this case with an anonymous user, but the same principle should apply with a specific one):
resource "kafka_acl" "terraform-access-topics" {
resource_name = "*"
resource_type = "Topic"
acl_principal = "User:ANONYMOUS"
acl_operation = "All"
acl_permission_type = "Allow"
acl_host = "*"
}
@larsbrekken just wondering have you been able to create a kind of superadmin user in MSK which has access to all topics upfront ? I'm struggling to understand how would I create an admin user in MSK as it's not documented anywhere on AWS side.
@Constantin07 I created a terraform user that we use when running terraform, plus an admin user that we can use just in case. This has been working well for us.
resource "kafka_acl" "terraform-topic" {
resource_name = "*"
resource_type = "Topic"
acl_principal = "User:CN=terraform-user"
acl_operation = "All"
acl_permission_type = "Allow"
acl_host = "*"
}
# Terraform can perform all group operations
resource "kafka_acl" "terraform-group" {
resource_name = "*"
resource_type = "Group"
acl_principal = "User:CN=terraform-user"
acl_operation = "All"
acl_permission_type = "Allow"
acl_host = "*"
}
# The admin user can perform all topic operations
resource "kafka_acl" "admin-topic" {
resource_name = "*"
resource_type = "Topic"
acl_principal = "User:CN=admin-user"
acl_operation = "All"
acl_permission_type = "Allow"
acl_host = "*"
}
# The admin user can perform all group operations
resource "kafka_acl" "admin-group" {
resource_name = "*"
resource_type = "Group"
acl_principal = "User:CN=admin-user"
acl_operation = "All"
acl_permission_type = "Allow"
acl_host = "*"
}
# The admin user can perform all transactional operations
resource "kafka_acl" "admin-txid" {
resource_name = "*"
resource_type = "TransactionalID"
acl_principal = "User:CN=admin-user"
acl_operation = "All"
acl_permission_type = "Allow"
acl_host = "*"
}
Thanks a lot @larsbrekken Much appreciated. Do you know if this is required as well ?
resource "kafka_acl" "admin_cluster" {
resource_name = "*"
resource_type = "Cluster"
acl_principal = "User:CN=admin-user"
acl_operation = "All"
acl_permission_type = "Allow"
acl_host = "*"
}
When I try to add it to MSK cluster I get:
kafka_acl.admin_cluster: Creating...
Error: kafka server: This most likely occurs because of a request being malformed by the client library or the message was sent to an incompatible broker. See the broker logs for more details.
on main.tf line 67, in resource "kafka_acl" "admin_cluster":
67: resource "kafka_acl" "admin_cluster" {
@Constantin07 Sorry, I'm not familiar with the Cluster resource type. I searched our scripts and we're not defining that anywhere.
In case you missed it, broker logs are available in MSK now (you can e.g. direct them to an S3 bucket). Perhaps reviewing those would give you enough information to resolve the issue.
Thanks @larsbrekken
I'm not familiar with the Cluster resource type
If you don't add Cluster ACL, all other ACLs are useless as any principal could connect and change ACLs via Kafka admin cluster API (as AWS MSK Kafka allow.everyone.if.no.acl.found == true by default)
@Mongey am I right?
Thanks a lot @larsbrekken Much appreciated. Do you know if this is required as well ?
resource "kafka_acl" "admin_cluster" { resource_name = "*" resource_type = "Cluster" acl_principal = "User:CN=admin-user" acl_operation = "All" acl_permission_type = "Allow" acl_host = "*" }When I try to add it to MSK cluster I get:
kafka_acl.admin_cluster: Creating... Error: kafka server: This most likely occurs because of a request being malformed by the client library or the message was sent to an incompatible broker. See the broker logs for more details. on main.tf line 67, in resource "kafka_acl" "admin_cluster": 67: resource "kafka_acl" "admin_cluster" {
try resource_name as this then your user will have whole cluster level access = ["kafka-cluster"]
@thennati I'm afraid I won't be able to help you with that. It's been a couple of years since I worked on this specifically, and I don't work with Kafka or Terraform at the moment. Good luck, though!