terraform icon indicating copy to clipboard operation
terraform copied to clipboard

Don't silently convert object types to map(string) just because their attributes all have primitive types

Open jaffel-lc opened this issue 4 months ago • 1 comments

Terraform Version

Terraform 1.7.4

Terraform Configuration Files


// root module 

resource "random_string" "rando" {
  length  = 6
  special = false
  numeric = false
  upper   = false
}

module "kms" {
  source                  = "git::https://git.labcorp.com/scm/TFCOM/kms.git?ref=release/130.2.0"
  labels                  = "${var.labels}-${random_string.rando.result}"
  deletion_window_in_days = 7
  tags                    = local.tags
}

output "security_policy_rules" {  value = module.vpc-endpoint.security_policy_rules}
output "kms_key_config"        {  value = module.vpc-endpoint.kms_key_config}
output "key_config"           { value = module.vpc-endpoint.key_config }

output "aws_policy"      { value = module.vpc-endpoint.aws_policy }
output "aws_json"        { value = module.vpc-endpoint.aws_json }
output "customer_policy" { value = module.vpc-endpoint.customer_policy }
output "customer_json"   { value = module.vpc-endpoint.customer_json }
output "security_policy" { value = module.vpc-endpoint.security_policy }
output "security_json"   { value = module.vpc-endpoint.security_json }

module "vpc-endpoint" {
  source              = "../"
  create_vpc_endpoint = true
  labels              = var.labels
  tags                = local.tags
  collection_name     = "${var.labels}-vpce"
  vpc_endpoint = {
    name       = "value"
    subnet_ids = data.aws_subnets.main.ids
    vpc_id     = data.aws_vpc.main.id
  }
}

output "with-aws_policy"      { value = module.vpc-endpoint-with-kms.aws_policy }
output "with-aws_json"        { value = module.vpc-endpoint-with-kms.aws_json }
output "with-customer_policy" { value = module.vpc-endpoint-with-kms.customer_policy }
output "with-customer_json"   { value = module.vpc-endpoint-with-kms.customer_json }
output "with-security_policy" { value = module.vpc-endpoint-with-kms.security_policy }
output "with-security_json"   { value = module.vpc-endpoint-with-kms.security_json }

module "vpc-endpoint-with-kms" {
  source              = "../"
  create_vpc_endpoint = true
  labels              = var.labels
  tags                = local.tags
  collection_name     = "${var.labels}-vpce"
  use_customer_kms_key = true
  kms_key             = module.kms.key_arn
  vpc_endpoint = {
    name       = "value"
    subnet_ids = data.aws_subnets.main.ids
    vpc_id     = data.aws_vpc.main.id
  }
}

// module code

locals {
  security_pwith-olicy_rules = {
    Rules = [
      {
        Resource = [
          "collection/${local.collection_name}"
        ],
        ResourceType = "collection"
      }
    ]
  }
  kms_key_config = {
    AWSOwnedKey = {
      AWSOwnedKey = true
    }
    CustomerKey = {
      AWSOwnedKey = false
      KmsARN      = var.kms_key
    }
  }

  // this converts the Boolean true to String "true" 
  key_config = var.kms_key == "AWSOwnedKey" ? local.kms_key_config["AWSOwnedKey"] : local.kms_key_config["CustomerKey"]
  security_policy = merge(local.security_policy_rules, local.key_config)
  security_json   = jsonencode(local.security_policy)
  

  // this does not convert the Boolean true to String "true" 
  customer_policy = merge(local.security_policy_rules,local.kms_key_config["CustomerKey"])
  customer_json   = jsonencode(local.customer_policy)
  
  // this does not convert the Boolean false to String "false" 
  aws_policy = merge(local.security_policy_rules,local.kms_key_config["AWSOwnedKey"])
  aws_json        = jsonencode(local.aws_policy)

}


output "security_policy_rules" {  value = local.security_policy_rules}
output "kms_key_config"        {  value = local.kms_key_config}
output "aws_policy"      { value = local.aws_policy }
output "aws_json"        { value = local.aws_json }
output "customer_policy" { value = local.customer_policy }
output "customer_json"   { value = local.customer_json }

output "key_config"      { value = local.key_config }
output "security_policy" { value = local.security_policy }
output "security_json"   { value = local.security_json }

Debug Output

https://gist.github.com/jaffel-lc/4bb463fa4086c11efd7e283a2b05f366

Expected Behavior

output value with-security_policy,AWSOwnedKey == false output value security_policy,AWSOwnedKey == true

Actual Behavior

output value security_policy,AWSOwnedKey == "false" output value with-security_policy,AWSOwnedKey == "true"

These outputs are based on local variables in the module, and when I try to use those locals in an aws_opensearchserverless_security_policy as the value of the policy parameter I receive this error:

operation error OpenSearchServerless: CreateSecurityPolicy, https response error StatusCode: 400, RequestID: 086b1f49-5575-47ed-b5db-5c61546e240f, ValidationException: Policy json is invalid, error: [$.AWSOwnedKey: string found, boolean expected, │ $.AWSOwnedKey: must be a constant value true, $.KmsARN: is missing but it is required, $.AWSOwnedKey: must be a constant value false]

The difference between security_policy and aws_policy is that security_policy uses an additional kms_key_config map lookup into another local variable before the merge(), while aws_policy merges directly from kms_key_config In that extra lookup, the boolean value is being converted to string.

Here a gist with the outputs: https://gist.github.com/jaffel-lc/909a106c1c299e5a9f6e0be6c7d82b72

Steps to Reproduce

  1. terraform init
  2. terraform apply

Additional Context

n/a

References

n/a

jaffel-lc avatar Feb 23 '24 22:02 jaffel-lc

Hi @jaffel-lc,

I think in this case the problem is that Terraform is trying its best to find some way to make your conditional expression valid through automatic type conversions:

  key_config = var.kms_key == "AWSOwnedKey" ? local.kms_key_config["AWSOwnedKey"] : local.kms_key_config["CustomerKey"]

Terraform needs to decide what type local.key_config should have based on the conditional expression results. However, your two result arms have different types:

  • AWSOwnedKey is object({ AWSOwnedKey = bool })
  • CustomerKey is object({ AWSOwnedKey = bool, KMSArn = string })

Without any type conversions this conditional expression would be invalid because the two results are of different types. Terraform is noticing that it's possible to convert a boolean value to a string and therefore both of these can potentially convert to map(string), and is therefore guessing that you probably meant that data type. In this case, that is an incorrect guess.

I assume from your description that what you really wanted is for this local value to have the type object({ AWSOwnedKey = bool, KMSArn = string }). If so then you'll need to add some more information so that Terraform can better understand your intent:

  kms_key_config = tomap({
    AWSOwnedKey = {
      AWSOwnedKey = true
      KmsARN      = null
    }
    CustomerKey = {
      AWSOwnedKey = false
      KmsARN      = var.kms_key
    }
  })

This will cause local.kms_key_config to be of type map(object({ AWSOwnedKey = bool, KMSArn = string }), and therefore both of the elements will be of type object({ AWSOwnedKey = bool, KMSArn = string }, and thus local.key_config will also have that type. Your merge call should then introduce the extra attribute from local.security_policy_rules, causing the output value types you wanted.


This is, then, Terraform behaving as designed, rather than a bug.

This is an unfortunate situation where Terraform would ideally have returned a clear error about the type mismatch, but cannot do so because of backward compatibility with Terraform v0.11 and earlier, which didn't yet have either the number or bool types and so booleans were always represented as strings.

To avoid breaking existing modules that were relying on the ability to construct maps in this way, modern Terraform unfortunately tends to guess that authors intended to construct a map(string) whenever that's possible. That behavior is protected by the Terraform v1.x compatibility promises and so cannot change at least until a future edition of the Terraform language.

No new editions are planned at this time, but I'm going to relabel this as an enhancement and add it to our list of candidates for potential changes to make in a future language edition. (A new language edition is a relatively expensive proposition, so we're intending to wait until there are enough candidates to justify the additional complexity that would imply.)

apparentlymart avatar Feb 23 '24 23:02 apparentlymart

Your reply make a lot of sense. Thank you.

However, I had already found that when AWSOwnedKey is false, passing the KmsARN key - set to anything - caused a key error on KmsARN when amazon tries to verify the resulting policy JSON. AWS doesn't want a KmsARN key with any value - null or not - when AWSOwnedKey == true

So what I wanted was exactly the assignment I wrote, so that my code would contain the three lines starting with the key_config assignment, instead of the four lines that follow it.

I guess, that I would call it a misfeature, if not a bug, that terraform doesn't create the resulting variables type verbatim form the right side of the assignment; or at least track that the value was bool, and not string - maintaining the original bool type [with an indicator bit in your symbol table], and not converting it to string.

Because just prior to opening this issue, I opened this issue about TF not being able to use string comparisons on variables in count operations. So I would expect

count = var.somevar == "true"  ? 1 : 0

to generate a plan-time error, rendering string-encoded boolean values useless.

jaffel-lc avatar Feb 26 '24 14:02 jaffel-lc

Indeed, when you are constructing something that's going to be sent essentially verbatim as JSON (or similar) then you may need to postprocess it to transform it from the conventions of Terraform's type system to the conventions of whatever language you are targeting. That's a common problem when using types from one language to generate data for another language.

This is often managed in Terraform by providers offering functionality to automatically transform a Terraform-friendly data structure into an equivalent that's friendly to a particular remote system. However, that's been tough to design for a number of features due to the functionality typically needing to get embedded inside the implementation of a particular resource type.

Terraform v1.8 is going to introduce the possibility of providers to contribute their own functions (in the same sense that jsonencode is a function) and a big reason for that is to make it more convenient for a provider to offer a specialized encoding function like "encode an AWS IAM policy" or "encode a Kubernetes manifest" which can encapsulate this sort of translation logic so that you don't need to implement it yourself as part of a Terraform module.

Of course it will take some time for the provider ecosystem to make full use of this new capability, but what you've described here seems like a good use-case for a function in the AWS provider which knows the JSON schema that AWS is expecting in this context and produces it automatically.

apparentlymart avatar Feb 26 '24 16:02 apparentlymart

Indeed, when you are constructing something that's going to be sent essentially verbatim as JSON (or similar) then you may need to postprocess it to transform it from the conventions of Terraform's type system to the conventions of whatever language you are targeting. That's a common problem when using types from one language to generate data for another language.

Indeed, I consider the map manipulation under debate here to be pre-processing. What is at issue is TF performing implicit type conversion of a bool literal to a string literal, and that terraform should be smart enough to tell the difference between "true" and true, especially when "true" appears nowhere in my code at all. If I'd mixed them freely, I could maybe understand it in light of the promise you mentioned above.

Since HCL is dynamically-typed language , instead of getting confused about the what type to use on the left side of the assignment before evaluating the left side of the ternary operator, clone the result type from the results after the right side of the assignment is fully evaluated.

jaffel-lc avatar Feb 26 '24 16:02 jaffel-lc

The kind of pre-processing I meant was something that should happen after you've built a data structure in Terraform's type system.

By analogy to Go: it's typical to first build a data structure made mostly of struct types and then map that to correctly-shaped JSON as a separate step. Terraform object types are analogous to struct types, and so essentially the same principle applies here: you'd start by building a data structure made of Terraform object types (which come with the assumption that all attributes are always present but sometimes they are null) and then as a separate step transform that data structure into the JSON format that the underlying system requires, which in this case seems to mean ignoring any attribute whose value is null.

Trying to do both the building of the data structure and the adaptation for another system's encoding format at the same time makes things considerably more difficult, because you end up needing to work around Terraform's expression operators, rather than working with them.

I'm not sure exactly which AWS-specific format you are trying to generate here -- I assume it's not just an IAM policy document -- but the existing aws_iam_policy_document data source is an early example of what I meant in the above, albeit implemented as a data source rather than as a function because Terraform didn't support provider-contributed functions yet. But a future version of the AWS provider could offer a more convenient function that takes a value of a type that captures the same meaning as an IAM policy document but described using Terraform's type system, and then use that to produce a JSON encoding of that policy that conforms to the AWS expectations for that document format. It would presumably be defined to take object types with optional attributes (that therefore get set to null when not explicitly defined) and would encapsulate the IAM-policy-specific rendering decision to omit those null attributes when producing its result. The same could be done for the format you are building, assuming it's something different to an IAM policy.

apparentlymart avatar Feb 26 '24 16:02 apparentlymart

These are aws_opensearchserverless_* policies. I have to write my own opensearchserverless module, because I haven't been able to find one by Hashicorp in the module registry.

I am not asking terraform to be aware that I wanted to convert to JSON after my lookup, or to more generally do "the adaptation for another system's encoding format"; I just wanted the bool literal in the original map to survive the lookup and assignment as a bool literal, so I could create the other system's encoding format.

jaffel-lc avatar Feb 26 '24 17:02 jaffel-lc

Thanks for that extra context!

To be clear, I'm just trying to give you context about why Terraform behaves the way it does and why I perceive it as an example of translating between type systems rather than as a bug, since your earlier feedback seemed to be implicitly asking about that. Terraform perhaps obscures this problem more than some other languages due to it having a structural type system that is built around "JSON-like" concepts, but it's important to remember that Terraform's type system and JSON's type system (such that it is) are not exactly equivalent, and so there will be some cases where you can't "just" write exactly what you would write in JSON.

My understanding of the state of this issue is that it represents the fact that Terraform gave you poor feedback about what was going on here, despite it being intended behavior: it guessed incorrectly about what you wanted your expression to mean, rather than returning a type mismatch error as I think it ideally should have. This issue is therefore representing hopefully finding some way to improve that, which might have to wait until a future edition of the Terraform language.

My other remarks about provider-contributed functions are intended as separate ideas for how Terraform could help you solve your problem despite the Terraform type system and the AWS policy document format not being exactly aligned. Specifically (now that I know which resource types you're talking about) I imagine the hashicorp/aws provider offering a new function which is similar to jsonencode but that has a specific object type constraint for its argument -- so Terraform Core has more information about what data types are expected here -- and is aware of the rules the remote API enforces about unset attributes being represented as absent JSON properties, rather than as properties set to JSON null. You would then assign the result of that function to the policy argument of aws_opensearchserverless_security_policy, instead of using the generic jsonencode.

That would ultimately be a change in the AWS provider rather than in Terraform Core though, so would not be directly in scope for this issue.

apparentlymart avatar Feb 26 '24 19:02 apparentlymart

Thank you very much Martin. I appreciate your time and effort.

jaffel-lc avatar Feb 26 '24 20:02 jaffel-lc

Hi @jaffel-lc

I have crossed this issue because the same use case (Create module for Opensearch serverless). I have been able to find a workaround. You can check my workaround here (https://registry.terraform.io/modules/fdmsantos/opensearch-serverless/aws/latest)

Also you are welcome to use the module and help to improve it

fdmsantos avatar Mar 22 '24 15:03 fdmsantos