terraform-provider-iterative icon indicating copy to clipboard operation
terraform-provider-iterative copied to clipboard

Accept scripts without requiring additional Base64 encoding

Open 0x2b3bfa0 opened this issue 4 years ago • 19 comments

Terraform support for heredocs and files would play nicely with literal scripts and wouldn't require users to encode them in advance or resort to file and string encoding functions. We might still use the latter internally for foolproof escaping of templated files, but users should be able of writing plain scripts manually.

0x2b3bfa0 avatar Mar 21 '21 13:03 0x2b3bfa0

@0x2b3bfa0 I think is not a great idea. All the vendors enforces you to base64 is not nothing new. Also for the cml-runner command is going to be a mess

DavidGOrtega avatar Mar 23 '21 09:03 DavidGOrtega

Ideally, performing the following modification to the pertaining template line would be enough on the CML side: ${startup_script ? `startup_script = base64decode("${startup_script}")` : ''} I thought it could be a good move for the user experience, but... 🤷‍♂️

0x2b3bfa0 avatar Mar 23 '21 10:03 0x2b3bfa0

@0x2b3bfa0 Its breaking CML but also basic terraform provider. Now the TPI is not accepting the base64 encoded script.

DavidGOrtega avatar Jul 26 '21 16:07 DavidGOrtega

@0x2b3bfa0 one of the main reasons also is that base64 is far more robust than this heredoc. It's terribly painful to make this work through a template

DavidGOrtega avatar Jul 26 '21 17:07 DavidGOrtega

but users should be able of writing plain scripts manually.

Users have been able to use custom script in Azure and AWS in base64 so we are not worse!

DavidGOrtega avatar Jul 26 '21 17:07 DavidGOrtega

@DavidGOrtega, given that the provider still is in its early stages of development, I think that we can afford introducing this change without breaking anything.

As stated above, CML wouldn't have to use the here document syntax and can continue to provide scripts in Base64 to avoid string escaping issues.

0x2b3bfa0 avatar Jul 26 '21 21:07 0x2b3bfa0

Follow-up of iterative/cml#168

Currently, resource_cml_runner takes Base64-encoded scripts, while resource_machine takes raw scripts. Independently of the approach we choose, it should be the same for both resources. I would prefer to modify the former to behave like the latter.

0x2b3bfa0 avatar Sep 04 '21 10:09 0x2b3bfa0

@0x2b3bfa0 @DavidGOrtega Why not have both in a consistent, explicit manner - 2 separate fields: startup_script and startup_script_base64 . If both fields are non-"" it will fail (no guessing!)

omesser avatar Apr 10 '22 18:04 omesser

Why not have both in a consistent, explicit manner - 2 separate fields

Because Terraform has an inbuilt base64decode function with this exact purpose.

0x2b3bfa0 avatar Apr 10 '22 18:04 0x2b3bfa0

Yeah (obviously 😬 ) but not sure I follow what are you suggesting wrt the choice of whether to expect file contents encoded or not (meaning: from the user). b64 codec'ing is easy both for the user and for us, so why take away the benefit of giving the user the flexibility to provide the script either base64 encoded or not?

omesser avatar Apr 10 '22 22:04 omesser

not sure I follow what are you suggesting wrt the choice of whether to expect file contents encoded or not

I propose[^1] to accept raw scripts in both iterative_machine & iterative_cml_runner and drop support for Base64–encoded data.

Currently, iterative_machine expects raw scripts and iterative_cml_runner expects Base64–encoded scripts. 🙃

b64 codec'ing is easy both for the user and for us, so why take away the benefit of giving the user the flexibility to provide the script either base64 encoded or not?

Expecting only raw scripts doesn't “take away any benefits”, it just avoids polluting the API with a redundant argument. See the collapsed comment below for some examples.

[^1]: https://github.com/iterative/terraform-provider-iterative/issues/91#issuecomment-912951829, not that important since iterative/cml#461 anyway.

0x2b3bfa0 avatar Apr 10 '22 23:04 0x2b3bfa0

Common

This is what users would do in both cases when the script is not encoded.

Raw String

resource "iterative_cml_runner" "example" {
  startup_script = "IyEvYmluL3NoCmVjaG8gdGFzdGVmdWwK"
}

Raw Heredoc

resource "iterative_cml_runner" "example" {
  startup_script = <<-END
    #!/bin/sh
    echo tasteful
  END
}

Raw File

resource "iterative_cml_runner" "example" {
  startup_script = file("${path.root}/tasteful.sh")
}

Different Arguments

This is what users would have to do when the script is Base64–encoded, if the resource implements a separate Base64 argument.

Base64 String

resource "iterative_cml_runner" "example" {
  startup_script_base64 = "IyEvYmluL3NoCmVjaG8gbm90IHF1aXRlCg=="
}

Base64 Heredoc

resource "iterative_cml_runner" "example" {
  startup_script_base64 = <<-END
    IyEvYmluL3NoCmVjaG8gbm90IHF1aXRlCg==
  END
}

Base64 File

resource "iterative_cml_runner" "example" {
  startup_script_base64 = file("${path.root}/not_quite.sh.base64")
}

Raw Only

This is what users would have to do when the script is Base64–encoded, using the builtin functions.

Base64 String

resource "iterative_cml_runner" "example" {
  startup_script = base64decode("IyEvYmluL3NoCmVjaG8gdGFzdGVmdWwK")
}

Base64 Heredoc

resource "iterative_cml_runner" "example" {
  startup_script = base64decode(
    <<-END
      IyEvYmluL3NoCmVjaG8gdGFzdGVmdWwK
    END
  )
}

Base64 File

resource "iterative_cml_runner" "example" {
  startup_script = base64decode(file("${path.root}/tasteful.sh.base64"))
}

0x2b3bfa0 avatar Apr 10 '22 23:04 0x2b3bfa0

@0x2b3bfa0 - due to this I suggest to think whether it's beneficial to drop the b64 encoded field support. Hence the suggestion to have both co-exist. I can think of a few scenarios where it's worth to have this base64 support in place, I'm not sure how relevant they are though:

  • When programmatically generating tf configs and templating values into them, much easier to work and template in a b64 string
  • CLI support - If this can be populated in any scenario by an input var e.g. terraform apply -var 'X.Y.startup_script_base64=<stuff>

omesser avatar Apr 11 '22 00:04 omesser

When programmatically generating tf configs and templating values into them

See https://github.com/iterative/terraform-provider-iterative/issues/91#issuecomment-804783976; startup_script = base64decode("{{startup_script_base64}}") is equally convenient. In fact, that comment proposes a verbatim drop–in replacement for our current template.

0x2b3bfa0 avatar Apr 11 '22 00:04 0x2b3bfa0

CLI support - If this can be populated in any scenario by an input var e.g. terraform apply -var 'startup_script_base64=<stuff>'

When using terraform -var UTF–8 strings which don't contain \0 don't need Base64 encoding. Anyway, the inbuilt functions provide the required functionality:

variable "input" {
}

output "output" {
  value = base64decode(var.input)
}
terraform apply -var input="$(base64 input)"

0x2b3bfa0 avatar Apr 11 '22 00:04 0x2b3bfa0

With the risk of repeating the point a bit 😄

Agreed the bas64decode function exists and things can be decoded in various places and achieved either way. The point I'm making is about user experience - Giving users what they expect and keeping their syntax clean and flat.

I don't think that this

startup_script = base64decode("{{startup_script_base64}}")

is an equally convenient alternative to this:

startup_script_base64 = "{{startup_script_base64}}"

And saving the user the need to convert their input is a good perk UX wise, when hitting a common use cases, and if I catch the drift here, it can be very common.

It might be worth "polluting the api" with another (optional) field. Granted, this is opinionated, but do consider please - making lives easier for users with a clean interface they will actually look at vs 1 less optional field (what are the operational costs of this? for them? for us?).

Same thing in CLI -var input="$(base64 input)" vs -var input_base64="input" - first option is cumbersome and most people don't enjoy shell fun-times. This would have been much worse in a non-toy example (i.e. real contents in input) - escaping a script's content to put it as a raw string in a shell command is not a nice sight to behold (hence not a good practice), base64 is very often used as a catch-all to avoid escaping issues even though the contents can be properly escaped if you care to spend the time (ML engineers won't, right?).

This might save the users a few minutes scratching their heads building the tf config and the code/commands around it, and more importantly, keeping terraform configs as flat+simple as possible on the user end, so they can debug their inputs outside of terraform and get full easy visibility to what the provider gets as inputs / vars. Also, there's the upside of eliminating ambiguity as per what does startup_script expect (raw or base64), due to the existence of startup_script_base64 - seeing both in the docs makes it all clear. Same goes, btw for startup_script_file - these are basically convenient wrappers. The fact that 2 approaches were mixed means that there is place for ambiguity and it might be nice to reduce it

omesser avatar Apr 11 '22 09:04 omesser

I'm also leaning towards this

  • startup_script: change to expect raw (affects CML)
  • startup_script_base64: new, expects base64 (CML would switch to using this)
  • error on both defined (conflicts)

casperdcl avatar Apr 11 '22 20:04 casperdcl

The point I'm making is about user experience - Giving users what they expect and keeping their syntax clean and flat.

If we're talking of existing Terraform power users, it looks like we have irreconcilliable differences with regard to their alleged expectations. ⚔️ 😅

I don't think that this [startup_script = base64decode("data")] is an equally convenient alternative to this [startup_script_base64 = "data"]

Neither do I: to my mind, the former is much more convenient both for users and for ourselves. 😈

And saving the user the need to convert their input is a good perk UX wise, when hitting a common use cases, and if I catch the drift[^1] here, it can be very common.

Providing a Base64–encoded script is not a valid[^2] use case, unless proven otherwise. It's just something we do internally on CML to avoid the need of escaping our runner templates.

If this argument accepted arbitrary binary data (as opposed to UTF–8 text), it would make sense to use Base64 as the only/preferred input format, but this is not the case.

This argument is not mapped to the cloud–init user data value read by cloud instances. It's just being injected into a bigger templated script, and providing binary data wouldn't have other effect beyond upsetting the shell interpreter: https://github.com/iterative/terraform-provider-iterative/blob/2dd6c603529880ce3f942d44d916ff03f1dbce12/iterative/resource_runner.go#L340-L342

Moreover, the current approach is doubly inconvenient: the primary[^4] consumer of the iterative_cml_runner resource is the cml runner command, and this particular argument is exposed through the --cloud-startup-script command–line option.

Currently, this option “run[s] the provided Base64-encoded Linux shell script during the instance initialization”, so users have[^3] to execute cml runner --cloud-startup-script="$(base64 --wrap 0 <<< 'echo hello world')" and take care of encoding for no reason; that is bad user experience.

User Data

Just for reference, the official AWS provider accepts both forms, exactly like in your proposal, although other providers only take plain text.

[^1]: The best cloud infrastructure pun I have ever seen! 😂 [^2]: Replace with “common” if it sounds like a baseless value judgement [^3]: See cml#536 (comment) and cml#772 (comment) for hitorical evidence [^4]: And probably the only, by the way (insert tumbleweed emoji)

0x2b3bfa0 avatar Apr 13 '22 02:04 0x2b3bfa0

The fact that 2 approaches were mixed means that there is place for ambiguity and it might be nice to reduce it

Indeed, it might be nice to reduce the ambiguity, by aligning their behavior 😄

0x2b3bfa0 avatar Apr 13 '22 02:04 0x2b3bfa0

Given the status of the current API, this can be safely demoted to p2-nice-to-have and will be addressed tangentially when reworking the cml runner --cloud internals.

0x2b3bfa0 avatar Oct 10 '22 03:10 0x2b3bfa0

I think this can just be closed.

  • iterative_machine is essentially deprecated / will be significantly reworked if used when we make another go at the dvc machine ideas.
  • iterative_runner is not intended to be used by a user directly and this type of script logic can just be handled by cml when it builds it's main.tf internally.

dacbd avatar Oct 10 '22 14:10 dacbd

I thought this issue about allowing cml runner launch --cloud-startup-script to be plaintext?

casperdcl avatar Oct 12 '22 13:10 casperdcl

I thought this issue about allowing cml runner launch --cloud-startup-script to be plaintext?

even if it is, that shouldn't require any changes to tpi like has been discussed in this thread. There is already a related issue on cml to accept a file for that parameter. plaintext, base64, or file don't need different interfaces in the terraform scheme for the iterative_runner resource, they can all use the same one.

dacbd avatar Oct 12 '22 14:10 dacbd

https://github.com/iterative/cml/issues/1167 indeed

casperdcl avatar Oct 12 '22 20:10 casperdcl