terraform configurable output format for yamlencode

Current Terraform Version

Terraform v0.12.13
+ provider.null v2.1.2

Use-cases

I want to use terraform to generate yaml formatted configuration files for an ansible based installation.

Attempted Solutions

Currently we dump a jsonencode(${var.some_map_var}) file to the target system, and then use remote-exec that runs a Python script that parses the .json file to generate the desired config.yaml

With a map of

some_map_var = {
  foo = ["bar", "baz"]
  dofoo = true
}

This will generate a nice yaml that Ansible can use, i.e.

foo:
- bar
- baz

dofoo: true

Having discovered the yamlencode function in 0.12 this seems like a really nice option to avoid the escape hatch of the remote-exec python script and stay truer to Terraform native end-to-end.

However, the current yamlencode function seems to produce a file like this

"foo":
- "bar"
- "baz"
"dofoo": true

where all the keys are quoted (I guess because they are strings), rather than giving us a nice UTF-8 unquoted yaml file as we get with our Python parser. This seems to create some issues for Ansible.

Proposal

Allow (at least a config switch) to generate yaml files what does not quote keys and values

References

Nov 08 '19 18:11 hassenius

+1, I think we should produce some nice looking YAML :)

Dec 07 '19 17:12 ocervell

Ran into this problem today. The quotations are causing weird issues with Kubernetes config maps (I have to embed a YAML into a config map key)

Feb 18 '20 02:02 rafaelmagu

+1. Want to be able to create config maps from terraform maps.

Feb 20 '20 10:02 zidz

This causes problems in a number of environments where downstream applications consume YAML but dislike the "quote everything" + "alphabetical sorted" output of the yamlencode function.

Apr 27 '20 03:04 techdragon

Yes, it would be nice to keep the original ordering of fields in the template file + remove the quotes. You could use a beautifier for that within Terraform after converting to YAML.

May 13 '20 08:05 ocervell

+1, encountered this issue just now

Jul 09 '20 20:07 avgalani

+1, Also encountered this issue today

Jul 17 '20 12:07 tcdev00

I have a workaround for this. It's working for me, but beware; It's kinda hacky.

Having this variable:

some_map_var = {
  foo = ["bar", "baz"]
  dofoo = true
}

Wrap it with a regex replace function:

replace(yamlencode(var.some_map_var), "/((?:^|\n)[\\s-]*)\"([\\w-]+)\":/", "$1$2:")

Results in this output:

foo:
- "bar"
- "baz"
dofoo: true

Oct 26 '20 15:10 maikelvl

Hi all! Sorry for the slow response here.

I was just reviewing the comments here and it seems like while some of the comments could be considered just a matter of style preference (some folks prefer the unquoted YAML style, which is fair enough), I also see several of you talking about situations where other software has refused to process the yamlencode results.

To summarize I see:

"This seems to create some issues for Ansible."
"The quotations are causing weird issues with Kubernetes config maps"
"a number of environments where downstream applications consume YAML but dislike the 'quote everything' + 'alphabetical sorted' output"

When we first introduced yamlencode we did try to leave some room for making subtle improvements to its output by marking it as experimental, but in practice I think it's more-or-less fixed in place now, because we don't really want to cause churn (potentially involving forced-replacement) for existing callers just for stylistic preferences.

However, I expect we would make some different tradeoffs if it turned out that what yamlencode is producing is invalid in some way, such that it can't be parsed by other valid YAML parsers. If you all can share some more concrete examples of output that yamlencode produces that specific other software won't accept then I'd love to review those in a little more detail and see if we can find a compromise that would help those applications work without creating the broad churn for existing users that I'm worried about. If you have any links to relevant documentation for that software to share alongside those concrete examples that'd be extra helpful, since the Terraform team isn't necessarily intimately familiar with the details of other software.

Since generating YAML is only an ancillary use-case for Terraform and not its primary purpose, I don't expect that we would invest in a highly-configurable yamlencode function: that'd make the function far more complex than originally intended, and it's already pretty complicated. However, I would like to see about adjusting its output so that its single available behavior is more useful by being more compatible with existing software, if we can.

Thanks!

I do want to note that there's a key difference here between a purely stylistic tradeoff like string quoting compared to the functional difference of specifying map keys in a particular order. For the latter, it's not yamlencode that's discarding the ordering but rather the Terraform language itself, because Terraform's map type is unordered.

Dealing with these various little differences between type systems is part of the game when it comes to cross-language serialization formats, so I'd hope that anyone writing a YAML parser would be pragmatic and realize that there are plenty of languages which (like Terraform) don't have order-preserving mapping types.

If not though, unfortunately I don't think we can really help much with that because the original ordering information just isn't there, and often wasn't inherent in the source data in the first place if e.g. the map was constructed dynamically using a function. If you need that level of control, you'd need to use a different strategy to generate YAML mechanically yourself, such as generating it from a template where you can dictate exactly which punctuation, whitespace, and ordering the result would have.

Jul 22 '21 23:07 apparentlymart

@apparentlymart I think the point is the yamlencode function does not produce valid YAML, at all, for anything. No YAML should have the maps, keys, and lists in quotes. That is not the standard anywhere and parsers that encode or lint proper YAML syntax would have a problem with this.

For example, the AWS EKS Terraform module created by AWS uses yamlencode to render data for the aws-auth configMap in Kubernetes. This defines the mapping of AWS IAM accounts, roles, and users to Kubernetes groups and users for access control to the entire cluster. https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/aws_auth.tf#L81

The YAML data in the configMap should look like this, as shown in their documentation. This is a standard Kubernetes manifest https://docs.aws.amazon.com/eks/latest/userguide/add-user-role.html

apiVersion: v1
kind: ConfigMap
metadata:
  name: aws-auth
  namespace: kube-system
data:
  mapRoles: |
    - rolearn: <ARN of instance role (not instance profile)>
      username: system:node:{{EC2PrivateDNSName}}
      groups:
        - system:bootstrappers
        - system:nodes

However, this is what you get as a result of using the Terraform EKS module that aggregates maps and lists in locals then uses the yamlencode function to render the YAML for the data in the configMap that is created with the Kubernetes provider configMap resource. None of these quotes should have been added, it's not required, and goes against the entire point of YAML being more human-readable. I don't think I've seen any YAML parser that puts everything in quotes like this.

  mapRoles: |
    - "groups":
      - "system:bootstrappers"
      - "system:nodes"
      "rolearn": "arn:aws:iam::{redacted}:role/eks-workers-role"
      "username": "system:node:{{EC2PrivateDNSName}}"
    - "groups":
      - "system:masters"
      "rolearn": "arn:aws:iam::{redacted}:role/AWSReservedSSO_AdministratorAccess"
      "username": "AWSReservedSSO_AdministratorAccess"
    - "groups":
      - "system:bootstrappers"
      - "system:nodes"
      "rolearn": "arn:aws:iam::{redacted}:role/eks-node-role"
      "username": "system:node:{{EC2PrivateDNSName}}"

The yamlencode function is useless without needing to use replace functions to remove them, not ideal and not always possible to do. I might not want to replace all """ with "". The reason being the occasion when you want to change the type to string. Where a boolean type (e.g. key: true ) or maybe you want the value case as a string (e.g. key: "true"), same with numbers as 123456 or cast to string with "123456"

Another example would be to look at any Kubernetes manifest YAML or use the Helm template command to render a chart into manifest YAML. The only time you'd have quotes around the value is for things like numbers that you want to be treated as a string type. You don't even need quotes or escapes when a key name has . or / within them as long as it's before the :

kind: ConfigMap
metadata:
  creationTimestamp: "2021-08-28T04:45:12Z"
  labels:
    eks.amazonaws.com/component: coredns
    k8s-app: kube-dns
  name: coredns
  namespace: kube-system
  resourceVersion: "10831359"

The reason why this is so extremely important and why you do not want this is that, unlike JSON, whitespace matters in YAML The number of spaces, not tabs, and therefore the indentations of maps, lists, and how data would be nested is important for it to be valid in most cases.

Sep 04 '21 08:09 fitchtech

+1 for non-quoted yaml keys and most values

Sep 06 '21 17:09 ivan046

don't get me wrong, I would 100% prefer a way to only use quoted keys/values when it's required, but...

I think the point is the yamlencode function does not produce valid YAML, at all, for anything. No YAML should have the maps, keys, and lists in quotes. That is not the standard anywhere and parsers that encode or lint proper YAML syntax would have a problem with this.

@fitchtech fyi the YAML 1.1 spec does actually have examples of quoted keys being valid YAML, which they totally are. it's just not super well spelled out imo.

https://yaml.org/spec/1.1/

the YAML 1.2 spec has a slightly different example, but demonstrates the same validity of quoted keys.

https://yaml.org/spec/1.2/

Sep 07 '21 20:09 joshsleeper

@joshsleeper while it maybe be valid it does cause issues. Also it does not follow proper YAML styling. Using quotes has a specific meaning in YAML unlike JSON or HCL. For example if I have locals { number = 12345 } that's specifying a number data type. So I would expect the YAML equivalent be..

number: 12345

And not this..

"number": "12345"

That's not what I declared or want as the output. It should be the same data type and only cast to string when set that way.

For example, if it were locals { number = "12345" } that's a string and would then expect the YAML encoded output to be..

number: "12345"

It just doesn't make sense to put all the keys, values, and maps in quotes like this. It's not useful in practical application and I always avoid it.

An easier approach with cleaner YAML is to use the templatefile function with a map of maps variable that inserts your YAML blocks within a template file using a string template for each expression. Nesting that within YAML decode in locals then let's you pass it to other blocks easily like the data block of a Kubernetes Config map resource.

Sep 08 '21 04:09 fitchtech

while I agree that arbitrarily quoting numbers and boolean values would be a problem, I'm not seeing such behavior in yamlencode() at this point in time?

# sample.tf
locals {
  test_yamlencode = yamlencode({
    string_key : "string_value"
    simple_number : 123
    complex_number : 1e+3
    123 : 123
    bool_key : false
    map : [
      "map_string", 456, true,
    ]
  })
}

output "test_yamlencode" {
  value = local.test_yamlencode
}

$ terraform plan

Changes to Outputs:
  + test_yamlencode = <<-EOT
        "123": 123
        "bool_key": false
        "complex_number": 1000
        "map":
        - "map_string"
        - 456
        - true
        "simple_number": 123
        "string_key": "string_value"
    EOT

You can apply this plan to save these new output values to the Terraform state, without changing any real infrastructure.

────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Note: You didn't use the -out option to save this plan, so Terraform can't guarantee to take exactly these actions if you
run "terraform apply" now.

string, boolean, and number values passed to yamlencode(), both in and outside of a map, all seem to end up with the correct YAML types after encoding, not all arbitrarily quoted and turned into strings as you suggest. you're right though, that would def be an issue if it was doing that!

the only change I'm really seeing is it forcing key quoting (which really should be considered a style thing since all keys act like strings and it's perfectly valid according to the spec) and forcing string value quoting (which again is perfectly valid and often recommended to avoid special characters behaving oddly).

Sep 08 '21 18:09 joshsleeper

@joshsleeper didn't realize it was not quoting numbers and bool type values at least. Still seems strange that all the other keys and string values are in quotes despite that being unnecessary. IMHO the only times it should be quoted in the YAML is when you want number or bool cast as string, e.g. "12345" or "true"

Sep 08 '21 19:09 fitchtech

Thanks for raising the question about the use of quotes, and for the efforts here to uncover whether it represents a practical problem for interoperability with other software.

yamlencode intentionally always uses quoted strings because a significant change between YAML 1.1 and YAML 1.2 was a change to the implicit tagging rules for plain scalars, and in particular YAML 1.1 leaves the interpretation of plain scalars to be defined by the application, rather than directly defining it.

Using quoted strings universally is therefore a compromise that ensures that most other parsers (of both YAML versions) will interpret the value as a string without incurring the high readability cost of writing out explicit type tags. We intend the result to follow the YAML 1.2 core schema while also being unambiguous to a YAML 1.1 parser (as far as possible, given that YAML 1.1 intentionally treats various parsing rules as application-defined).

Based on what we've seen so far, this seems like an example of a style preference rather than an interoperability problem and thus not within the scope of changes we'd consider making to yamlencode.

Sep 08 '21 21:09 apparentlymart

I have a case where I am using Terraform + SaltStack + Consul. I have SaltStack setup to read pillar information from Consul:

consul_config root=saltstack/private/%(minion_id)s

In Terraform I would write a key named role that assigns a role to an EC2 instance I am provisioning. An instance can have multiple roles:

resource "consul_keys" "master" {
	key {
		path  = "saltstack/private/${aws_route53_record.master.fqdn}/role"
		value = yamlencode(["salt_master", "consul_server", "netdata_server"])
	}
}

This writes the key as follows:

+ key {
          + delete = false
          + flags  = 0
          + path   = "saltstack/private/master.domain.tld/role"
          + value  = <<-EOT
                - "salt_master"
                - "consul_server"
                - "netdata_server"
            EOT
        }

The key's value has the quotes when I look at Consul.

However it seems that Saltstack is able to handle it just fine and remove the quotes:

root@master:~# salt '*' pillar.items
master.domain.tld:
    ----------
    role:
        - salt_master
        - consul_server
        - netdata_server

Even though for my use case it seems to work, its odd because normally you wouldn't put quotes around those items if you where defining this in a local YAML file. Most people would probably be thrown off by this behavior (I was initially).

Nov 05 '21 08:11 tuaris

I use this workaround

  set {
     name  = "config"
     value = replace(yamlencode(
       {region:"eu-west-1",
       set_timestamp:"false",
       period_seconds : "240",
       metrics:[ { aws_namespace : "AWS/RDS",
                   aws_metric_name: "ReadLatency",
                   aws_dimensions:"[DBInstanceIdentifier]",
                   aws_dimension_select: "{DBInstanceIdentifier : [db-complete-mysql-444105]}" ,
                   aws_statistics : "[Average]"
                 },
   
                 ]
      }
     ),"\"","" )
   }

Nov 12 '21 16:11 usernkey

In addition to things above, this causes configuration drift for Terraform rancher_app_v2 input, which seems to format yaml in a different way and as result, there are always confiuration drifts when using yamlencode output as rancher_app_v2 values input

Feb 22 '22 18:02 herrbpl

Hi @herrbpl,

In Terraform's architecture, part of the responsibility of a provider is to include rules to recognize the difference between two values that are materially different -- that is, the meaning has changed -- vs. two values that are just two different ways to write down the same information.

There are already lots of examples of providers handling this for JSON, where remote APIs will often accept JSON as input but store the data internally in some other format, re-serializing it to JSON on read and therefore potentially producing a different serialization.

Although this is the first example I've seen of a system doing this with YAML -- and surprising, because presumably that means it will also discard any comments you included in the input, thus defeating a main benefit of YAML over JSON -- I think the same architectural principle still applies: the Rancher provider ought to have a rule to detect when two values are serializations of the same data and classify that as an immaterial change, to allow the configuration and state to converge.

I'd suggest recording that as a feature request for the provider. Unfortunately since I think this is the first example of doing it for YAML in particular, rather than for e.g. JSON, it'll take some extra up-front work to write a comparison function for YAML, whereas in JSON situations there is one built into the SDK which can handle many simple situations. However, I assume the same principle will apply as for the JSON equivalent: parse both the old and the new to discard the irrelevant syntax details, and then compare them to see if there are any remaining differences beyond just syntax.

Feb 22 '22 23:02 apparentlymart

Thanks for detailed reply. Now that i think of it, I seem to recall rancher_app and app_v2 use string for values input. Even extra line line feed causes drift. I'll post this to their provider tracker.

Feb 24 '22 14:02 herrbpl

My 2c: providers should never* deal with YAML directly. There is very rarely a situation where JSON wouldn't be better: you can reasonably normalize JSON for most applications, thereby preventing drift without having to parse it and compare the parsed tree. And JSON is a subset of YAML these days, so all YAML-compliant apps should be able to handle it.

If at any point along the chain anything re-encodes the YAML, you're almost certainly going to lose stylistic information anyway: AFAIK there exists no YAML re-encode process that perfectly preserves stylistic info (all whitespace, all quote styles, all comments). So if your application only deals with the subset of YAML structure that is JSON-compatible, you may as well use JSON because your YAML's going to get mangled anyway. Style-preserving YAML is almost a fundamentally separate type to we-only-care-about-data YAML.

For example, take the helm_release's value field. It accepts a YAML-string. Yet the actual value getting written to the k8s HelmRelease object is a string of JSON!

	Values *apiextensionsv1.JSON `json:"values,omitempty"`

Indeed, when it loads values from state, it uses a YAML Unmarshaller that converts from JSON and therefore cannot preserve anything JSON cannot in state. So why not just store JSON in the state? It can't actually handle non-JSONable YAML, and it's not going to preserve/diff comments in the actual resource. Better yet, instead of making a JSON string, make it a map(any) and make the user decode it.

Anyway, my point is: I'm guessing YAML re-encoding stability is not actually that necessary in practice because no real API actually wants an yamlencode'd string in the first place. Happy to see a solid counterexample though!

(* Exception might be when the output is meant for human consumption and you need to preserve its exact stylistic structure, comments, etc. but I'm hard-pressed to think of an example of that in the Terraform realm.)

Apr 21 '22 18:04 nfi-opsguru

One counterexample would be cloud-init. I would argue that you could just store the shebang-style comment and the body separately, then mix them together in YAML for the user when writing to the API.

Apr 21 '22 18:04 nfi-opsguru

I agree that it would be weird for a provider to itself be dealing with YAML. I think the main situations for yamlencode are those like the cloud-init example you mentioned, where there is some other system at least two hops away from Terraform that is expecting YAML and the API that the provider is directly interacting with just expects an arbitrary bag of bytes to pass on to that remote system. In that case, it would not be possible for the provider to detect and handle normalization because the content of the bag of bytes is opaque to the provider. But also, it doesn't typically matter because often that system that ultimately uses the YAML doesn't get any opportunity to normalize it in a way that would reflect back in the API, and so the bag of bytes remains verbatim what the author originally submitted.

I do find the Rancher example surprising for this reason, but I'm not familiar enough with Rancher to understand the details of what's going on there. It seems like either the Rancher provider or the Rancher API are directly using the YAML but are reflecting it back in a normalized form, which is pretty unusual as I mentioned above and I've still not encountered another example of such a design.

I'd rather keep discussions about the designs of specific providers in those providers' own issue trackers though, so that their authors (who know far more about the underlying systems than I do) can be the ones to make the necessary tradeoffs. For our purposes with this issue, if a provider has behavior like discussed above where it (or the API it interacts with) accepts YAML and normalizes it then it would be the provider's responsibility to classify that normalization as normalization, so that Terraform will not report it as a meaningful change. Whether the provider should be doing that is a matter for the provider developers to consider for themselves, but the previous situation is one of the consequences they should consider when making that decision.

Apr 21 '22 22:04 apparentlymart

I'd rather keep discussions about the designs of specific providers in those providers' own issue trackers though, so that their authors

Yep, I was just using helm_release as an example.

I think you're right: it's up to the provider to know its resource API details and avoid drift where there isn't a meaningful change. Namely, it should not be up to the user, via normalization flags to yamlencode or otherwise, to ensure that non-functional drift doesn't occur.

So I'd say the solution to this particular issue is just a clear Terraform policy around that, that users and provider devs can be pointed at when this comes up.

That said, my advice as a provider dev is to never do API calls with raw YAML if it can be avoided.

Apr 22 '22 15:04 nfi-opsguru

Hi @apparentlymart --

I am assuming you're affiliated with Hashicorp and Terraform. Thank you for your answers and for your effort here.

The nature of this thread reveals a core truth of Terraform, namely that it is a semantically correct and pure software tool.

There are a multitude of use-cases for producing YAML (and JSON) as these are the primary data interchange mechanisms used by modern software. While you assert that it's not a primary function of the software, that cannot really be true, as a fundamental purpose of Terraform is to interoperate with other software. If it is the case that the latest version of YAML allows for quoting, that's delightful, but it's not anyone's current reality. It may be pure, but it ain't real :-)

I try to do things right as often as possible in the software I work on. But I work in reality. I hope you and other Terraformers will understand the day to day challenges those of us who do battle daily are faced with and think about ways to be right by default, and be flexible as an option. A yamlencode function that produces YAML that other systems (in my case, Buildkite) cannot consume is of pretty limited usage.

I have great respect for and appreciation of the Terraform tool and team. Thanks for listening.

Jul 12 '22 22:07 tomharrisonjr

Hi @tomharrisonjr!

My request above was to share specific examples of software that doesn't implement YAML in a way that supports the format that Terraform is generating, in which case we would review whether it is either Terraform or the other software that is incorrect and adjust Terraform if appropriate.

I'm still willing to do that, and it does sound like you have a potential example to share. Can you say a little more about what's going on with Buildkite that is causing you problems? I understand that Buildkite is closed-source SaaS software and so not possible for you to describe details about its implementation, but if you can show the input you tried to send to Buildkite (with yamlencode) and the specific error or other problem you encountered when you did so then I'd be happy to review it, and see what we might do to improve compatibility here. We need to see exactly what the problem is though, so we can see exactly what minimal change is needed to achieve compatibility.

Jul 13 '22 20:07 apparentlymart

The produced YAML code causes issues when the sorting of the terraform code is not kept the same. For example, here is a aws_imagebuilder_component which uses S3Download action which fails currently as the produced YAML swaps the source and destination object properties.

terraform code:

resource "aws_imagebuilder_component" "prod_scheduler_tasks" {
  data = yamlencode({
    phases = [
	{
      name = "build"
      steps = [
        {
          name   = "download-task-scripts"
          action = "S3Download"
          inputs = [
            {
              source      = "s3://${aws_s3_bucket.image_builder.id}/scheduled_task_scripts/*",
              destination = "C:\\Automation\\"
            }
          ]
          onFailure = "Abort"
        }
      ]
    }]
    schemaVersion = 1.0
  })
  name        = "install-scheduled-tasks-${var.environment}"
  description = "Installs Tasks Scheduler tasks for Prod env"
  platform    = "Windows"
  version     = "0.0.1"
}

And that produced this YAML code:

resource "aws_imagebuilder_component" "prod_scheduler_tasks" {
       arn                   = "arn:aws:imagebuilder:eu-west-1:685621570121:component/install-scheduled-tasks-test/0.0.2/1"
       data                  = <<-EOT
            "phases":
            - "name": "build"
              "steps":         
             - "action": "S3Download"
               "inputs":
               - "destination": "C:\\Automation\\"
                 "source": "s3://image-builder-test/scheduled_task_scripts/*"
               "name": "download-task-scripts"
               "onFailure": "Abort"          
...
}

In the above, the source property needs to be above the destination in order for that "S3Download" component step to work

Sep 20 '22 11:09 georgikoemdzhiev

Hi @georgikoemdzhiev! Thanks for sharing that.

Do you know which software is ultimately parsing and decoding that YAML document? I see that you are passing it to an AWS provider resource type, but I'm not sure whether it's the AWS provider which parses it or if it just sends that whole string to some other system which then parses it. I'd like to identify who owns the parser so we can understand the impact of this difference.

This situation is unfortunately more fundamental than just customizing the output format, because listing source before destination here requires information that Terraform doesn't have. As is the case in several other languages, maps in Terraform are not an order-preserving data type and so the order of definition of elements in a constructor is only a source code artifact and has no effect on the behavior at runtime. I don't think allowing a caller to control the serialization order for map elements will be possible with yamlencode in its current design. Instead, we'd need a new function which defines some way to describe a YAML mapping in terms of a Terraform sequence (list or tuple), because those are the two data types in Terraform that retain element order.

I wonder how this YAML structure would be described in other languages that similarly do not retain the declaration order of a constructed map. 🤔

Sep 20 '22 15:09 apparentlymart

Hello, thank you for addressing my comment.

Do you know which software is ultimately parsing and decoding that YAML document?

I believe the software that parses the YAML is AWSTOE and it is used by AWS Image Builder itself but I am not sure. Looking at the Image Builder docs it certainly sounds like that is the software parsing the YAML

This is an extract from the docs: Image Builder uses the AWS Task Orchestrator and Executor (AWSTOE) component management application to orchestrate complex workflows.

Sep 20 '22 21:09 georgikoemdzhiev

terraform terraform copied to clipboard

configurable output format for yamlencode

Current Terraform Version

Use-cases

Attempted Solutions

Proposal

References

terraform
terraform copied to clipboard