terraform
terraform copied to clipboard
Support multi-document YAML files with yamldecode()
yamldecode causes an error if YAML contains more than one document, causing significant complications when working with Kubernetes. Please remove this restriction, returning a list of objects in case of a multidocument yaml. Alternatively, you could introduce a yamllistdecode()
function that always assumes the yaml to be multidocument, and always returns a list of objects.
Current Terraform Version
Terraform v1.0.8
on linux_amd64
+ provider registry.terraform.io/gavinbunney/kubectl v1.11.3
+ provider registry.terraform.io/hashicorp/kubernetes v2.5.0
...
Use-cases
GIven an arbitrary Kubernetes YAML manifest, e.g. a CRD manifest downloaded from a site, one would be able to apply it using this code. Related stack overflow questions: 1 and 2
# DOES NOT WORK -- yamldecode() will fail on multi-document YAML
resource "kubernetes_manifest" "crd" {
# create a map of { "kind" -- "name" => yaml }
for_each = {
for value in yamldecode(data.http.crd.body) : "${value["kind"]}--${value["metadata"]["name"]}" => value
}
manifest = each.value
}
Attempted Solutions
This solution works ok, but is obviously very brittle with all the regex yaml manipulations
resource "kubernetes_manifest" "crd" {
# Create a map { "kind--name" => yaml_doc } from the multi-document yaml text.
# Each element is a separate kubernetes resource.
# Must use \n---\n to avoid splitting on strings and comments containing "---".
# YAML allows "---" to be the first and last line of a file, so make sure
# raw yaml begins and ends with a newline.
# The "---" can be followed by spaces, so need to remove those too.
# Skip blocks that are empty or comments-only in case yaml began with a comment before "---".
for_each = {
for value in [
for yaml in split(
"\n---\n",
"\n${replace(data.http.crd.body, "/(?m)^---[[:blank:]]*(#.*)?$/", "---")}\n"
) :
yamldecode(yaml)
if trimspace(replace(yaml, "/(?m)(^[[:blank:]]*(#.*)?$)+/", "")) != ""
] : "${value["kind"]}--${value["metadata"]["name"]}" => value
}
manifest = each.value
}
Proposal
Per above, allow yamldecode(...)
to decode multi-document yaml files, or introduce an additional parameter or a new function to handle them.
See also
The 3rd party kubectl
provider even introduced a dedicated kubectl_file_documents data source to handle this specific case.
Would the teraform be open for a PR?
@mvoitko how difficult would it be to implement something like this? I think such simple functionality is likely to be accepted by the TF core team, but of course it could also result in a wasted efforts (hopefully not).
Discussing and validating a potential solution (even if it seems fairly straight-forward on the surface) will increase the chances a PR gets accepted. I'd recommended writing up a description of the proposed changes before starting work, and verifying there aren't any hidden reasons the functionality exists in its current form. I hope this helps!
Documentation on yamldecode() states clearly that:
Only one YAML document is permitted. If multiple documents are present in the given string then this function will return an error.
This contributor, tells a way to do it with locals, split function (based on --- separator of new document), and a count inside the kubernetes_manifest.
Your way is more refined when we look at the regex (spaces, comments that might be found around ---).
You can use https://github.com/patrickdappollonio/kubectl-slice as a workaround
curl -sL https://github.com/patrickdappollonio/kubectl-slice/releases/download/v1.2.1/kubectl-slice_1.2.1_linux_x86_64.tar.gz | tar -xvzf -;
rm -rf slices hcl;
./kubectl-slice -f document.yaml -o slices 2>&1 | grep -oP "Wrote \K.+yaml" | while read yamlfile; do echo 'yamldecode(file("'$yamlfile'"))' | terraform console >>hcl; done;
cat hcl
@AndreiBanaruTakeda that simple split example is not very good because it ignores a lot of corner cases. See the example above in the Attempted Solutions
section - it also splits by ---
but tries to handle many more cases. I went through many iterations with it, and I know there are still some edge cases that I am not handling - that's why I think it is important to implement this as part of the terraform core function.
@feczo We already have a workaround I am talking about more robust solution.
Instead of string splitting, a nice way of traversing multi-doc yaml https://registry.terraform.io/providers/gavinbunney/kubectl/latest/docs/data-sources/kubectl_file_documents.
Instead of string splitting, a nice way of traversing multi-doc yaml https://registry.terraform.io/providers/gavinbunney/kubectl/latest/docs/data-sources/kubectl_file_documents.
Instead of one function, this workaround suggests using a new provider.
It really feels like this provider should support this use case, without relying on another provider to parse the multi-YAML document
@brettjacobson we are not talking about "another provider" -- instead this issue is about the built-in terraform function to parse YAML, without any specific provider's support.
terraform-multidecoder-yaml_json
Access multiple YAML and/or JSON files with their relative paths in one step.
Documentantion can be found here:
GitHub: https://github.com/levmel/terraform-multidecoder-yaml_json
Terraform Registry: https://registry.terraform.io/modules/levmel/yaml_json/multidecoder/latest
Usage
Place this module in the location where you need to access multiple different YAML and/or JSON files (different paths possible) and pass your path/-s in the parameter filepaths which takes a set of strings of the relative paths of YAML and/or JSON files as an argument. You can change the module name if you want!
module "yaml_json_decoder" {
source = "levmel/yaml_json/multidecoder"
version = "0.2.1"
filepaths = ["routes/nsg_rules.yml", "failover/cosmosdb.json", "network/private_endpoints/*.yaml", "network/private_links/config_file.yml", "network/private_endpoints/*.yml", "pipeline/config/*.json"]
}
Patterns to access YAML and/or JSON files from relative paths:
To be able to access all YAML and/or JSON files in a folder entern your path as follows "folder/rest_of_folders/*.yaml"
, "folder/rest_of_folders/*.yml"
or "folder/rest_of_folders/*.json"
.
To be able to access a specific YAML and/or a JSON file in a folder structure use this "folder/rest_of_folders/name_of_yaml.yaml"
, "folder/rest_of_folders/name_of_yaml.yml"
or "folder/rest_of_folders/name_of_yaml.json"
If you like to select all YAML and/or JSON files within a folder, then you should use "*.yml", "*.yaml", "*.json" format notation. (see above in the USAGE section)
YAML delimiter support is available from version 0.1.0!
WARNING: Only the relative path must be specified. The path.root (it is included in the module by default) should not be passed, but everything after it.
Access YAML and JSON entries
Now you can access all entries within all the YAML and/or JSON files you've selected like that: "module.yaml_json_decoder.files.[name of your YAML or JSON file].entry". If the name of your YAML or JSON file is "name_of_your_config_file" then access it as follows "module.yaml_json_decoder.files.name_of_your_config_file.entry".
Example of multi YAML and JSON file accesses from different paths (directories)
first YAML file:
routes/nsg_rules.yml
rdp:
name: rdp
priority: 80
direction: Inbound
access: Allow
protocol: Tcp
source_port_range: "*"
destination_port_range: 3399
source_address_prefix: VirtualNetwork
destination_address_prefix: "*"
---
ssh:
name: ssh
priority: 70
direction: Inbound
access: Allow
protocol: Tcp
source_port_range: "*"
destination_port_range: 24
source_address_prefix: VirtualNetwork
destination_address_prefix: "*"
second YAML file:
services/logging/monitoring.yml
application_insights:
application_type: other
retention_in_days: 30
daily_data_cap_in_gb: 20
daily_data_cap_notifications_disabled: true
logs:
# Optional fields
- "AppMetrics"
- "AppAvailabilityResults"
- "AppEvents"
- "AppDependencies"
- "AppBrowserTimings"
- "AppExceptions"
- "AppExceptions"
- "AppPerformanceCounters"
- "AppRequests"
- "AppSystemEvents"
- "AppTraces"
first JSON file:
test/config/json_history.json
{
"glossary": {
"title": "example glossary",
"GlossDiv": {
"title": "S",
"GlossList": {
"GlossEntry": {
"ID": "SGML",
"SortAs": "SGML",
"GlossTerm": "Standard Generalized Markup Language",
"Acronym": "SGML",
"Abbrev": "ISO 8879:1986",
"GlossDef": {
"para": "A meta-markup language, used to create markup languages such as DocBook.",
"GlossSeeAlso": ["GML", "XML"]
},
"GlossSee": "markup"
}
}
}
}
}
main.tf
module "yaml_json_multidecoder" {
source = "levmel/yaml_json/multidecoder"
version = "0.2.1"
filepaths = ["routes/nsg_rules.yml", "services/logging/monitoring.yml", test/config/*.json]
}
output "nsg_rules_entry" {
value = module.yaml_json_multidecoder.files.nsg_rules.aks.ssh.source_address_prefix
}
output "application_insights_entry" {
value = module.yaml_json_multidecoder.files.monitoring.application_insights.daily_data_cap_in_gb
}
output "json_history" {
value = module.yaml_json_multidecoder.files.json_history.glossary.title
}
Changes to Outputs:
- nsg_rules_entry = "VirtualNetwork"
- application_insights_entry = 20
- json_history = "example glossary"
@levmel Does this support multiple yaml objects in the same yaml file, delimited by ---
?
key: value
---
key: value
Then parses 2 YAML objects?
@brandongallagher-tag Not yet to be honest, because I split up my configuration always logically into different config files. That is why I never use delimiters in my config. I can add it to the next release.
For now it supports multiple YAML objects that are located in the same file, but without the delimiter.
Edit: I added delimiter support and jsondecode support in the newest version 0.2.1. Check out my GitHub repo. :) Cheers! Btw. I updated my previous comment completely.
Instead of string splitting, a nice way of traversing multi-doc yaml https://registry.terraform.io/providers/gavinbunney/kubectl/latest/docs/data-sources/kubectl_file_documents.
i think this is the best solution, my implementation tin install kubectl documents with multi document yaml is as follows
data "http" "yaml_file" {
url = "https://path.to.yaml.file.yaml"
}
data "kubectl_file_documents" "docs" {
content = data.http.yaml_file.response_body
}
locals {
yaml_file = [
for v in data.kubectl_file_documents.docs.documents : {
data : yamldecode(v)
content : v
}
]
}
resource "kubectl_manifest" "install" {
for_each = {
for v in local.yaml_file : lower(join("/", compact([
v.data.apiVersion,
v.data.kind,
lookup(lookup(v.data, "metadata", {}), "namespace", ""),
lookup(lookup(v.data, "metadata", {}), "name", "")
]))) => v.content
}
yaml_body = each.value
}
But this is to apply kubectl files, not for general usecases
I found a way to do it w/o any additional providers. Let me know if it works for your particular use case.
# Assign to a local after splitting by "---"
locals { my_manifests = split("---", templatefile("multiple-manifests.yaml") }
# Iterate by mapping with a range
resource "kubernetes_manifest" "many_objects" {
for_each = zipmap(range(0,length(local.my_manifests)),local.my_manifests)
manifest = yamldecode(each.value)
}
Hi. I was facing this exact question at this moment, when trying to process (via templates) multiple kubernetes files, converting them to HCL map/objects/...
The workarounds that are proposed seem to work on the majority of cases, but will probably fail if we have "---" ocurring anywhere on the files or, even worse, if we use heredocs with embedded multi-yaml files (in which case the heredoc-ed YAML will also get split, creating a mess).
For a more robust handling, something needs to convert to convert to multidocs (like the kubectl_file_documents
provider mentioned above proposes to do). Unfortunately gavinbunney/kubectl
seems to be abandoned and the fork that currently seems more active and volunteering to fix some issues (alekc/kubectl
) still has not managed to address pending ones, making this difficult to recommend as a general solution.
Following algorithm works well for me:
- Detect and replace document separator with SPECIAL_TOKEN="YAMLSEPARATOR"
- Split string on special token
- We need to remove invalid strings which can contain only comments: For that we try to dceode YAML.
- At the end we have to compact array to remove empty strings
- We will have separate yaml documents as array
locals {
yamlstrings= compact([ for s in split("YAMLSEPARATOR",replace(file("${path.module}/files/standard-install.yaml"), "/(?m:^---$)/", "YAMLSEPARATOR")) :
try (yamlencode(yamldecode(s)), "")
])
}
Please don't propose to address this by splitting multipart documents using antipatterns such as a string split "---".
Yes - maybe that solves your immediate need today, and maybe it works fine for 99% of the cases, but what about the tomorrow case, where your HCL falls into the 1% .. where the string split solution doesn't work, ..
such as where the strings --- are embedded (but properly escaped) in the file body, such as having markdown tables "|---|---|" in the YAML, or a multiplart MIME (ex: PEM encoded public certificates ---- BEGIN CERTIFICATE ---) inside the YAML in a text block.
String split is an antipattern, don't use that, your program will break in the future in a non-obvious way, possibly at a critical moment
@elasticdotventures its a hacky workaround to a pressing problem that unfortunately was never addressed. The community clearly needs a solution to this, and in the mean time... uses horribly-unstable crutches - simply because bad solution is better than no solution.
Thank you for your continued interest in this issue.
Terraform version 1.8 launches with support of provider-defined functions. It is now possible to implement your own functions! We would love to see this implemented as a provider-defined function.
Please see the provider-defined functions documentation to learn how to implement functions in your providers. If you are new to provider development, learn how to create a new provider with the Terraform Plugin Framework. If you have any questions, please visit the Terraform Plugin Development category in our official forum.
We hope this feature unblocks future function development and provides more flexibility for the Terraform community. Thank you for your continued support of Terraform!
@crw I read through some of the documentation but the docs don't really indicate what this looks like from the user's perspective. They really need a full example going from "here's your provider code" => "here's the actual HCL using that provider".
Do you build a custom function into your provider and it becomes available in the global HCL namespace?
So I could create a "better yaml" provider, users add it like any other provider (provider "better-yaml" {}
), and then the function betteryamldecode()
is just available to them like any native HCL function?
I think the docs need to explain this better- maybe I'm dumb though lol.
@red8888 tutorials are being developed, and should be available "soon." The reference docs do not have really have usage information, but check out the usage docs for a random function I happened to have open, it will show naming conventions and usage conventions:
https://registry.terraform.io/providers/hashicorp/aws/latest/docs/functions/arn_build
We are working on a provider defined function for this particular use-case in the Kubernetes provider here: https://github.com/hashicorp/terraform-provider-kubernetes/pull/2428.