terraform-provider-kafka icon indicating copy to clipboard operation
terraform-provider-kafka copied to clipboard

Allow provider definition to be dependent on other resources

Open pascalwhoop opened this issue 4 years ago • 5 comments

With the following config

provider "kafka" {
  bootstrap_servers = split(",", var.bootstrap_brokers)
  version = "0.2.5"
  ca_cert = aws_acmpca_certificate_authority.kafka-authentication-authority.certificate
    client_cert       = local_file.cert.filename
    client_key        = local_file.key.filename
//    client_cert       = module.kafka_root_certificate.secretmanager_certificate_secret
//    client_key        = module.kafka_root_certificate.secretmanager_private_key_secret
//  client_cert = file("${path.root}/.terraform/tmp/cert.pem")
//  client_key = file("${path.root}/.terraform/tmp/key.pem")
}

resource "local_file" "cert" {
  filename = "${path.root}/.terraform/tmp/cert.pem"
  sensitive_content = module.kafka_root_certificate.secretmanager_certificate_secret
}
resource "local_file" "key" {
  filename = "${path.root}/.terraform/tmp/key.pem"
  sensitive_content = module.kafka_root_certificate.secretmanager_private_key_secret
}

The provider crashes when trying to read ACL resource values in the same module with

module.kafka-pca.kafka_acl.admin_clusterwide: Refreshing state... [id=User:CN=mskadmin|*|All|Allow|Topic|*|Literal]
module.kafka-pca.kafka_acl.brokers_clusterwide: Refreshing state... [id=User:CN=*.ebbr-dev-kafka-dev.zv7s7p.c4.kafka.eu-central-1.amazonaws.com|*|All|Allow|Topic|*|Literal]

Error: rpc error: code = Unavailable desc = transport is closing



Error: rpc error: code = Unavailable desc = transport is closing

pascalwhoop avatar Jun 08 '20 21:06 pascalwhoop

@pascalwhoop 🤔

client_cert       = local_file.cert.filename

This is incorrect. it should be the contents of the file, and not the location. The commented out value seems correct.

//    client_cert       = module.kafka_root_certificate.secretmanager_certificate_secret

Why make the tmp, local file ?

Mongey avatar Jun 08 '20 21:06 Mongey

ah the tmp file was an attempt to work around the issue. The commented out variant gives the same issue @Mongey

Sorry, it's been 14h today, I'm a bit fried :grin:

The ACL that is being attempted to be read is

resource "kafka_acl" "admin_clusterwide" {
  depends_on = [module.kafka_root_certificate]
  resource_name = "*"
  resource_type = "Topic"
  acl_operation = "All"
  acl_host = "*"
  acl_permission_type = "Allow"
  acl_principal = "User:CN=${local.cert_cn}"
}

Which of course fails if the provider is somehow falsly initiated

pascalwhoop avatar Jun 08 '20 22:06 pascalwhoop

I think this is probably closer to a TLS issue, rather than passing in values from another provider / module.

This works for me with 0.2.5

provider "consul" {}

data "consul_keys" "kafka_servers" {
  datacenter = "dc1"

  key {
    name = "kafka"
    path = "kafka"
  }

  key {
    name = "ca_cert"
    path = "ca_cert"
  }

  key {
    name = "client_cert"
    path = "client_cert"
  }

  key {
    name = "client_key"
    path = "client_key"
  }
  key {
    name = "tls_enabled"
    path = "tls_enabled"
  }
}

provider "kafka" {
  bootstrap_servers = [data.consul_keys.kafka_servers.var.kafka]

  ca_cert     = data.consul_keys.kafka_servers.var.ca_cert
  client_cert = data.consul_keys.kafka_servers.var.client_cert
  client_key  = data.consul_keys.kafka_servers.var.client_key
  tls_enabled = tobool(data.consul_keys.kafka_servers.var.tls_enabled)

}

# Make sure we don't lock down ourself on first run of terraform.
# First grant ourself admin permissions, then add ACL for topic.
resource "kafka_acl" "global" {
  resource_name       = "*"
  resource_type       = "Topic"
  acl_principal       = "User:*"
  acl_host            = "*"
  acl_operation       = "All"
  acl_permission_type = "Allow"
}

resource "kafka_topic" "syslog" {
  name               = "syslog"
  replication_factor = 1
  partitions         = 4

  config = {
    "segment.ms"   = "4000"
    "retention.ms" = "86400000"
  }

  depends_on = [kafka_acl.global]
}

resource "kafka_acl" "test" {
  resource_name       = "syslog"
  resource_type       = "Topic"
  acl_principal       = "User:Alice"
  acl_host            = "*"
  acl_operation       = "Write"
  acl_permission_type = "Deny"

  depends_on = [kafka_acl.global]
}

Mongey avatar Jun 08 '20 22:06 Mongey

I believe there is a difference between a data node and a resource node. The errors I get happen during the plan phase. I’ll dig into the logs today.

— Pascal Brokmeier pascalbrokmeier.de/contact

On 9 Jun 2020, at 00:23, Conor Mongey [email protected] wrote:

 I think this is probably closer to a TLS issue, rather than passing in values from another provider / module.

This works for me with 0.2.5

provider "consul" {}

data "consul_keys" "kafka_servers" { datacenter = "dc1"

key { name = "kafka" path = "kafka" }

key { name = "ca_cert" path = "ca_cert" }

key { name = "client_cert" path = "client_cert" }

key { name = "client_key" path = "client_key" } key { name = "tls_enabled" path = "tls_enabled" } }

provider "kafka" { bootstrap_servers = [data.consul_keys.kafka_servers.var.kafka]

ca_cert = data.consul_keys.kafka_servers.var.ca_cert client_cert = data.consul_keys.kafka_servers.var.client_cert client_key = data.consul_keys.kafka_servers.var.client_key tls_enabled = tobool(data.consul_keys.kafka_servers.var.tls_enabled)

}

Make sure we don't lock down ourself on first run of terraform.

First grant ourself admin permissions, then add ACL for topic.

resource "kafka_acl" "global" { resource_name = "" resource_type = "Topic" acl_principal = "User:" acl_host = "*" acl_operation = "All" acl_permission_type = "Allow" }

resource "kafka_topic" "syslog" { name = "syslog" replication_factor = 1 partitions = 4

config = { "segment.ms" = "4000" "retention.ms" = "86400000" }

depends_on = [kafka_acl.global] }

resource "kafka_acl" "test" { resource_name = "syslog" resource_type = "Topic" acl_principal = "User:Alice" acl_host = "*" acl_operation = "Write" acl_permission_type = "Deny"

depends_on = [kafka_acl.global] } — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

pascalwhoop avatar Jun 09 '20 06:06 pascalwhoop

Alright I just thought about this a bit further. So here's the chain I have in full, which I don't see how it could work (but I also would expect terraform to be able to detect this):

  1. I create a private certificate for an AWS PCA. This requires a null_resource and a data external block because the aws provider doesn't yet support these.
# 3. prep a secret and write signed certificate arn to the secret
resource "aws_secretsmanager_secret" "certificate_arn" {
  name = "dev/kafka/cert_arn/${var.project_name}"
}

resource "null_resource" "signing_request" {
  #needs secret to be created because it changes its value
  triggers = {
    request = tls_cert_request.signing_request.cert_request_pem
  }
  provisioner "local-exec" {
    command = "${path.module}/signing_request.sh"
    environment = {
      PCA_ARN     = var.pca_arn
      CSR         = tls_cert_request.signing_request.cert_request_pem
      CERT_ARN_ID = aws_secretsmanager_secret.certificate_arn.id
    }
  }

}

# 4.  grab the certificate arn from the null_resource above cross-systems and get value
data "aws_secretsmanager_secret_version" "certificate_arn" {
  depends_on = [null_resource.signing_request]
  secret_id  = aws_secretsmanager_secret.certificate_arn.id
}

data "external" "certificate_value" {
  program = ["${path.module}/certificate_value.sh"]
  query = {
    pca_arn  = var.pca_arn
    cert_arn = data.aws_secretsmanager_secret_version.certificate_arn.secret_string
  }
}
  1. The result of data.external.certificate_value is what is passed in to the provider. I added a touch /tmp/I_WAS_HERE to the certificate_value.sh shell script but it doesn't get created.Meaning the data external doesn't get triggered during the plan phase.

So, the provider can't have a certificate. It fails but without telling me much about it. Digging into the logs I see skipping TLS client config which apparently gets logged if either the cert or the key is missing

config.go if clientCert != "" && clientKey != ""

Hence, I guess this issue tells us two things:

  1. Ideally we bubble the info that either of those two values is missing / wrong to the user. So "client_key provided is not a private key" or similar.
  2. Either terraform or the provider is having an issue resolving the external data BEFORE the initiation of the provider. It just skipps it silently which feels odd.

pascalwhoop avatar Jun 09 '20 08:06 pascalwhoop