terraform-provider-kafka icon indicating copy to clipboard operation
terraform-provider-kafka copied to clipboard

Topic deleted from state store when ACL on topic is deleted

Open mchai opened this issue 6 years ago • 11 comments
trafficstars

It seems that when both a topic and a related ACL are defined in Terraform, if you subsequently delete the terraform ACL definition, this will force the deletion of the topic from the state store, and then attempt to create the same topic which already exists. Example scenario below:

~/kafka_acl_issue $ ls -l
total 20
-rw-r--r-- 1 mat mat 233 May 28 21:07 acl.tf
-rw-r--r-- 1 mat mat 109 May 28 21:12 providers.tf
-rw-r--r-- 1 mat mat 156 May 28 21:39 terraform.tfstate
-rw-r--r-- 1 mat mat 564 May 28 21:39 terraform.tfstate.backup
-rw-r--r-- 1 mat mat 126 May 28 21:06 topic.tf

~/kafka_acl_issue $ cat topic.tf 
resource "kafka_topic" "test_topic" {
  name               = "test.topic"
  replication_factor = 1
  partitions         = 1
}

~/kafka_acl_issue $ cat acl.tf 
resource "kafka_acl" "test_acl" {
  resource_name       = "test.topic"
  resource_type       = "Topic"
  acl_principal       = "User:Alice"
  acl_host            = "*"
  acl_operation       = "Write"
  acl_permission_type = "Deny"
}

~/kafka_acl_issue $ terraform apply

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # kafka_acl.test_acl will be created
  + resource "kafka_acl" "test_acl" {
      + acl_host                     = "*"
      + acl_operation                = "Write"
      + acl_permission_type          = "Deny"
      + acl_principal                = "User:Alice"
      + id                           = (known after apply)
      + resource_name                = "test.topic"
      + resource_pattern_type_filter = "Literal"
      + resource_type                = "Topic"
    }

  # kafka_topic.test_topic will be created
  + resource "kafka_topic" "test_topic" {
      + id                 = (known after apply)
      + name               = "test.topic"
      + partitions         = 1
      + replication_factor = 1
    }

Plan: 2 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

kafka_topic.test_topic: Creating...
kafka_acl.test_acl: Creating...
kafka_topic.test_topic: Creation complete after 0s [id=test.topic]
kafka_acl.test_acl: Creation complete after 0s [id=User:Alice|*|Write|Deny|Topic|test.topic|Literal]

Apply complete! Resources: 2 added, 0 changed, 0 destroyed.

~/kafka_acl_issue $ mv acl.tf acl.tf.disabled

~/kafka_acl_issue $ terraform apply
kafka_acl.test_acl: Refreshing state... [id=User:Alice|*|Write|Deny|Topic|test.topic|Literal]
kafka_topic.test_topic: Refreshing state... [id=test.topic]

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create
  - destroy

Terraform will perform the following actions:

  # kafka_acl.test_acl will be destroyed
  - resource "kafka_acl" "test_acl" {
      - acl_host                     = "*" -> null
      - acl_operation                = "Write" -> null
      - acl_permission_type          = "Deny" -> null
      - acl_principal                = "User:Alice" -> null
      - id                           = "User:Alice|*|Write|Deny|Topic|test.topic|Literal" -> null
      - resource_name                = "test.topic" -> null
      - resource_pattern_type_filter = "Literal" -> null
      - resource_type                = "Topic" -> null
    }

  # kafka_topic.test_topic will be created
  + resource "kafka_topic" "test_topic" {
      + id                 = (known after apply)
      + name               = "test.topic"
      + partitions         = 1
      + replication_factor = 1
    }

Plan: 1 to add, 0 to change, 1 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

kafka_acl.test_acl: Destroying... [id=User:Alice|*|Write|Deny|Topic|test.topic|Literal]
kafka_topic.test_topic: Creating...
kafka_acl.test_acl: Destruction complete after 0s

Error: kafka server: Topic with this name already exists.

  on topic.tf line 1, in resource "kafka_topic" "test_topic":
   1: resource "kafka_topic" "test_topic" {


~/kafka_acl_issue $ cat terraform.tfstate
{
  "version": 4,
  "terraform_version": "0.12.0",
  "serial": 11,
  "lineage": "9fa0063f-476c-8bfb-0002-9ebc7478dd46",
  "outputs": {},
  "resources": []
}
~/kafka_acl_issue $ 

mchai avatar May 28 '19 12:05 mchai

🤔 I'm having a pretty hard time tracking down why this happens. It seems like once we create the ACL for the topic, kafka no longer responds with the topic in it's list of topics, and as such, it appears to have been deleted.

Mongey avatar Jun 04 '19 14:06 Mongey

Oh. I think I see what could be going on. It might be related to the broker setting allow.everyone.if.no.acl.found

As soon as we place an ACL on a topic, only superusers and those users granted access via an ACL can see it. So we need to ensure that whichever user is running the terraform job can see the topic after an ACL is applied to it.

mchai avatar Jun 05 '19 07:06 mchai

Just wanted to note that I was having the same issue with allow.everyone.if.no.acl.found set to true. From what I can tell, Terraform basically "locks itself out" after attaching the ACL to the topic, and causing the provider to get into a weird state (for me this included deleting the topic, and then later not recreating it after incorrectly concluding that it already existed).

After starting by defining an ACL allowing the user I'm running the Terraform script with access to topics I seem to be getting around this problem. Making this user a superuser is likely also a solution, but this option is unfortunately not available to me as I'm using AWS MSK.

This is what worked for me (in this case with an anonymous user, but the same principle should apply with a specific one):

resource "kafka_acl" "terraform-access-topics" {
  resource_name       = "*"
  resource_type       = "Topic"
  acl_principal       = "User:ANONYMOUS"
  acl_operation       = "All"
  acl_permission_type = "Allow"
  acl_host = "*"
}

larsbrekken avatar Jun 20 '19 17:06 larsbrekken

@larsbrekken just wondering have you been able to create a kind of superadmin user in MSK which has access to all topics upfront ? I'm struggling to understand how would I create an admin user in MSK as it's not documented anywhere on AWS side.

Constantin07 avatar Mar 19 '20 17:03 Constantin07

@Constantin07 I created a terraform user that we use when running terraform, plus an admin user that we can use just in case. This has been working well for us.

resource "kafka_acl" "terraform-topic" {
  resource_name       = "*"
  resource_type       = "Topic"
  acl_principal       = "User:CN=terraform-user"
  acl_operation       = "All"
  acl_permission_type = "Allow"
  acl_host            = "*"
}

# Terraform can perform all group operations
resource "kafka_acl" "terraform-group" {
  resource_name       = "*"
  resource_type       = "Group"
  acl_principal       = "User:CN=terraform-user"
  acl_operation       = "All"
  acl_permission_type = "Allow"
  acl_host            = "*"
}

# The admin user can perform all topic operations
resource "kafka_acl" "admin-topic" {
  resource_name       = "*"
  resource_type       = "Topic"
  acl_principal       = "User:CN=admin-user"
  acl_operation       = "All"
  acl_permission_type = "Allow"
  acl_host            = "*"
}

# The admin user can perform all group operations
resource "kafka_acl" "admin-group" {
  resource_name       = "*"
  resource_type       = "Group"
  acl_principal       = "User:CN=admin-user"
  acl_operation       = "All"
  acl_permission_type = "Allow"
  acl_host            = "*"
}

# The admin user can perform all transactional operations
resource "kafka_acl" "admin-txid" {
  resource_name       = "*"
  resource_type       = "TransactionalID"
  acl_principal       = "User:CN=admin-user"
  acl_operation       = "All"
  acl_permission_type = "Allow"
  acl_host            = "*"
}

larsbrekken avatar Mar 19 '20 18:03 larsbrekken

Thanks a lot @larsbrekken Much appreciated. Do you know if this is required as well ?

resource "kafka_acl" "admin_cluster" {
  resource_name       = "*"
  resource_type       = "Cluster"
  acl_principal       = "User:CN=admin-user"
  acl_operation       = "All"
  acl_permission_type = "Allow"
  acl_host            = "*"
}

When I try to add it to MSK cluster I get:

kafka_acl.admin_cluster: Creating...

Error: kafka server: This most likely occurs because of a request being malformed by the client library or the message was sent to an incompatible broker. See the broker logs for more details.

  on main.tf line 67, in resource "kafka_acl" "admin_cluster":
  67: resource "kafka_acl" "admin_cluster" {


Constantin07 avatar Mar 20 '20 17:03 Constantin07

@Constantin07 Sorry, I'm not familiar with the Cluster resource type. I searched our scripts and we're not defining that anywhere.

In case you missed it, broker logs are available in MSK now (you can e.g. direct them to an S3 bucket). Perhaps reviewing those would give you enough information to resolve the issue.

larsbrekken avatar Mar 23 '20 21:03 larsbrekken

Thanks @larsbrekken

Constantin07 avatar Mar 23 '20 23:03 Constantin07

I'm not familiar with the Cluster resource type

If you don't add Cluster ACL, all other ACLs are useless as any principal could connect and change ACLs via Kafka admin cluster API (as AWS MSK Kafka allow.everyone.if.no.acl.found == true by default)

@Mongey am I right?

azhurbilo avatar Aug 15 '20 00:08 azhurbilo

Thanks a lot @larsbrekken Much appreciated. Do you know if this is required as well ?

resource "kafka_acl" "admin_cluster" {
  resource_name       = "*"
  resource_type       = "Cluster"
  acl_principal       = "User:CN=admin-user"
  acl_operation       = "All"
  acl_permission_type = "Allow"
  acl_host            = "*"
}

When I try to add it to MSK cluster I get:

kafka_acl.admin_cluster: Creating...

Error: kafka server: This most likely occurs because of a request being malformed by the client library or the message was sent to an incompatible broker. See the broker logs for more details.

  on main.tf line 67, in resource "kafka_acl" "admin_cluster":
  67: resource "kafka_acl" "admin_cluster" {

try resource_name as this then your user will have whole cluster level access = ["kafka-cluster"]

thennati avatar May 05 '23 03:05 thennati

@thennati I'm afraid I won't be able to help you with that. It's been a couple of years since I worked on this specifically, and I don't work with Kafka or Terraform at the moment. Good luck, though!

larsbrekken avatar May 16 '23 16:05 larsbrekken