terraform-provider-iterative
terraform-provider-iterative copied to clipboard
Accept scripts without requiring additional Base64 encoding
Terraform support for heredocs and files would play nicely with literal scripts and wouldn't require users to encode them in advance or resort to file and string encoding functions. We might still use the latter internally for foolproof escaping of templated files, but users should be able of writing plain scripts manually.
@0x2b3bfa0 I think is not a great idea. All the vendors enforces you to base64 is not nothing new. Also for the cml-runner command is going to be a mess
Ideally, performing the following modification to the pertaining template line would be enough on the CML side:
${startup_script ? `startup_script = base64decode("${startup_script}")` : ''}
I thought it could be a good move for the user experience, but... 🤷♂️
@0x2b3bfa0 Its breaking CML but also basic terraform provider. Now the TPI is not accepting the base64 encoded script.
@0x2b3bfa0 one of the main reasons also is that base64 is far more robust than this heredoc. It's terribly painful to make this work through a template
but users should be able of writing plain scripts manually.
Users have been able to use custom script in Azure and AWS in base64 so we are not worse!
@DavidGOrtega, given that the provider still is in its early stages of development, I think that we can afford introducing this change without breaking anything.
As stated above, CML wouldn't have to use the here document syntax and can continue to provide scripts in Base64 to avoid string escaping issues.
Follow-up of iterative/cml#168
Currently, resource_cml_runner takes Base64-encoded scripts, while resource_machine takes raw scripts. Independently of the approach we choose, it should be the same for both resources. I would prefer to modify the former to behave like the latter.
@0x2b3bfa0 @DavidGOrtega
Why not have both in a consistent, explicit manner - 2 separate fields: startup_script and startup_script_base64 .
If both fields are non-"" it will fail (no guessing!)
Why not have both in a consistent, explicit manner - 2 separate fields
Because Terraform has an inbuilt base64decode function with this exact purpose.
Yeah (obviously 😬 ) but not sure I follow what are you suggesting wrt the choice of whether to expect file contents encoded or not (meaning: from the user). b64 codec'ing is easy both for the user and for us, so why take away the benefit of giving the user the flexibility to provide the script either base64 encoded or not?
not sure I follow what are you suggesting wrt the choice of whether to expect file contents encoded or not
I propose[^1] to accept raw scripts in both iterative_machine & iterative_cml_runner and drop support for Base64–encoded data.
Currently, iterative_machine expects raw scripts and iterative_cml_runner expects Base64–encoded scripts. 🙃
b64 codec'ing is easy both for the user and for us, so why take away the benefit of giving the user the flexibility to provide the script either base64 encoded or not?
Expecting only raw scripts doesn't “take away any benefits”, it just avoids polluting the API with a redundant argument. See the collapsed comment below for some examples.
[^1]: https://github.com/iterative/terraform-provider-iterative/issues/91#issuecomment-912951829, not that important since iterative/cml#461 anyway.
Common
This is what users would do in both cases when the script is not encoded.
Raw String
resource "iterative_cml_runner" "example" {
startup_script = "IyEvYmluL3NoCmVjaG8gdGFzdGVmdWwK"
}
Raw Heredoc
resource "iterative_cml_runner" "example" {
startup_script = <<-END
#!/bin/sh
echo tasteful
END
}
Raw File
resource "iterative_cml_runner" "example" {
startup_script = file("${path.root}/tasteful.sh")
}
Different Arguments
This is what users would have to do when the script is Base64–encoded, if the resource implements a separate Base64 argument.
Base64 String
resource "iterative_cml_runner" "example" {
startup_script_base64 = "IyEvYmluL3NoCmVjaG8gbm90IHF1aXRlCg=="
}
Base64 Heredoc
resource "iterative_cml_runner" "example" {
startup_script_base64 = <<-END
IyEvYmluL3NoCmVjaG8gbm90IHF1aXRlCg==
END
}
Base64 File
resource "iterative_cml_runner" "example" {
startup_script_base64 = file("${path.root}/not_quite.sh.base64")
}
Raw Only
This is what users would have to do when the script is Base64–encoded, using the builtin functions.
Base64 String
resource "iterative_cml_runner" "example" {
startup_script = base64decode("IyEvYmluL3NoCmVjaG8gdGFzdGVmdWwK")
}
Base64 Heredoc
resource "iterative_cml_runner" "example" {
startup_script = base64decode(
<<-END
IyEvYmluL3NoCmVjaG8gdGFzdGVmdWwK
END
)
}
Base64 File
resource "iterative_cml_runner" "example" {
startup_script = base64decode(file("${path.root}/tasteful.sh.base64"))
}
@0x2b3bfa0 - due to this I suggest to think whether it's beneficial to drop the b64 encoded field support. Hence the suggestion to have both co-exist. I can think of a few scenarios where it's worth to have this base64 support in place, I'm not sure how relevant they are though:
- When programmatically generating tf configs and templating values into them, much easier to work and template in a b64 string
- CLI support - If this can be populated in any scenario by an input var e.g.
terraform apply -var 'X.Y.startup_script_base64=<stuff>
When programmatically generating tf configs and templating values into them
See https://github.com/iterative/terraform-provider-iterative/issues/91#issuecomment-804783976; startup_script = base64decode("{{startup_script_base64}}") is equally convenient. In fact, that comment proposes a verbatim drop–in replacement for our current template.
CLI support - If this can be populated in any scenario by an input var e.g.
terraform apply -var 'startup_script_base64=<stuff>'
When using terraform -var UTF–8 strings which don't contain \0 don't need Base64 encoding. Anyway, the inbuilt functions provide the required functionality:
variable "input" {
}
output "output" {
value = base64decode(var.input)
}
terraform apply -var input="$(base64 input)"
With the risk of repeating the point a bit 😄
Agreed the bas64decode function exists and things can be decoded in various places and achieved either way.
The point I'm making is about user experience - Giving users what they expect and keeping their syntax clean and flat.
I don't think that this
startup_script = base64decode("{{startup_script_base64}}")
is an equally convenient alternative to this:
startup_script_base64 = "{{startup_script_base64}}"
And saving the user the need to convert their input is a good perk UX wise, when hitting a common use cases, and if I catch the drift here, it can be very common.
It might be worth "polluting the api" with another (optional) field. Granted, this is opinionated, but do consider please - making lives easier for users with a clean interface they will actually look at vs 1 less optional field (what are the operational costs of this? for them? for us?).
Same thing in CLI -var input="$(base64 input)" vs -var input_base64="input" - first option is cumbersome and most people don't enjoy shell fun-times. This would have been much worse in a non-toy example (i.e. real contents in input) - escaping a script's content to put it as a raw string in a shell command is not a nice sight to behold (hence not a good practice), base64 is very often used as a catch-all to avoid escaping issues even though the contents can be properly escaped if you care to spend the time (ML engineers won't, right?).
This might save the users a few minutes scratching their heads building the tf config and the code/commands around it, and more importantly, keeping terraform configs as flat+simple as possible on the user end, so they can debug their inputs outside of terraform and get full easy visibility to what the provider gets as inputs / vars.
Also, there's the upside of eliminating ambiguity as per what does startup_script expect (raw or base64), due to the existence of startup_script_base64 - seeing both in the docs makes it all clear. Same goes, btw for startup_script_file - these are basically convenient wrappers. The fact that 2 approaches were mixed means that there is place for ambiguity and it might be nice to reduce it
I'm also leaning towards this
startup_script: change to expect raw (affects CML)startup_script_base64: new, expects base64 (CML would switch to using this)- error on both defined (conflicts)
The point I'm making is about user experience - Giving users what they expect and keeping their syntax clean and flat.
If we're talking of existing Terraform power users, it looks like we have irreconcilliable differences with regard to their alleged expectations. ⚔️ 😅
I don't think that this [
startup_script = base64decode("data")] is an equally convenient alternative to this [startup_script_base64 = "data"]
Neither do I: to my mind, the former is much more convenient both for users and for ourselves. 😈
And saving the user the need to convert their input is a good perk UX wise, when hitting a common use cases, and if I catch the drift[^1] here, it can be very common.
Providing a Base64–encoded script is not a valid[^2] use case, unless proven otherwise. It's just something we do internally on CML to avoid the need of escaping our runner templates.
If this argument accepted arbitrary binary data (as opposed to UTF–8 text), it would make sense to use Base64 as the only/preferred input format, but this is not the case.
This argument is not mapped to the cloud–init user data value read by cloud instances. It's just being injected into a bigger templated script, and providing binary data wouldn't have other effect beyond upsetting the shell interpreter: https://github.com/iterative/terraform-provider-iterative/blob/2dd6c603529880ce3f942d44d916ff03f1dbce12/iterative/resource_runner.go#L340-L342
Moreover, the current approach is doubly inconvenient: the primary[^4] consumer of the iterative_cml_runner resource is the cml runner command, and this particular argument is exposed through the --cloud-startup-script command–line option.
Currently, this option “run[s] the provided Base64-encoded Linux shell script during the instance initialization”, so users have[^3] to execute cml runner --cloud-startup-script="$(base64 --wrap 0 <<< 'echo hello world')" and take care of encoding for no reason; that is bad user experience.
User Data
Just for reference, the official AWS provider accepts both forms, exactly like in your proposal, although other providers only take plain text.
-
Amazon Web Services:
aws_instanceexposes bothuser_dataanduser_data_base64Use this [
user_data_base64] instead ofuser_datawhenever the value is not a valid UTF-8 string. For example, gzip-encoded user data must be base64-encoded and passed via this argument to avoid corruption. -
Microsoft Azure:
azurerm_virtual_machineexposes onlycustom_datawhich expects a raw stringInternally, Terraform will base64 encode this value before sending it to the API.
-
Google Cloud:
google_compute_instanceexposes onlymetadata_startup_scriptwhich expects a raw string
[^1]: The best cloud infrastructure pun I have ever seen! 😂 [^2]: Replace with “common” if it sounds like a baseless value judgement [^3]: See cml#536 (comment) and cml#772 (comment) for hitorical evidence [^4]: And probably the only, by the way (insert tumbleweed emoji)
The fact that 2 approaches were mixed means that there is place for ambiguity and it might be nice to reduce it
Indeed, it might be nice to reduce the ambiguity, by aligning their behavior 😄
Given the status of the current API, this can be safely demoted to p2-nice-to-have and will be addressed tangentially when reworking the cml runner --cloud internals.
I think this can just be closed.
iterative_machineis essentially deprecated / will be significantly reworked if used when we make another go at the dvc machine ideas.iterative_runneris not intended to be used by a user directly and this type of script logic can just be handled by cml when it builds it'smain.tfinternally.
I thought this issue about allowing cml runner launch --cloud-startup-script to be plaintext?
I thought this issue about allowing
cml runner launch --cloud-startup-scriptto be plaintext?
even if it is, that shouldn't require any changes to tpi like has been discussed in this thread. There is already a related issue on cml to accept a file for that parameter. plaintext, base64, or file don't need different interfaces in the terraform scheme for the iterative_runner resource, they can all use the same one.
https://github.com/iterative/cml/issues/1167 indeed