terraform-aws-eks Add assertions to help save lots of wasted time with known bad configurations.

Is your request related to a new offering from AWS?

irrelevant

Is your request related to a problem? Please describe.

The combination of the following things causes people to waste a lot of time while trying to get these modules working:

It's very easy to configure these modules in a non-functional way that is known to be non-functional to experts.
The main documentation sources (variable docs and examples) often don't provide enough information to know when you're doing something wrong.
Trying to troubleshoot can take an enormous amount of time. A 1 line fix could take ~half an hour to do 1 cycle of try a fix->apply->wait 10+ mins->destroy->wait 10+ mins.

Some examples:

https://github.com/terraform-aws-modules/terraform-aws-eks/issues/1972#issuecomment-1086239416
- Note the mention of this taking "2 weeks" so far for that person. also note several other people running into the problem, and note that the solution was a single missing line which several experts weren't able to see right away.
https://github.com/terraform-aws-modules/terraform-aws-eks/blob/v19.21.0/docs/faq.md#i-received-an-error-expect-exactly-one-securitygroup-tagged-with-kubernetesioclustername-
- Note that the variable docs and examples don't mention the mutually exclusive nature of these settings. This is not an exceptional case.
use_custom_launch_template = false will cause most security group variables on both the eks and the node group modules to be silently ignored. Making the user's root module look very misleading.

Describe the solution you'd like.

Add assertions to catch obviously incorrect usage.

For example, for https://github.com/terraform-aws-modules/terraform-aws-eks/blob/v19.21.0/docs/faq.md#i-received-an-error-expect-exactly-one-securitygroup-tagged-with-kubernetesioclustername- check that only 1 of the two is set. It would probably also allow deleting the FAQ entry then because the question will probably no longer come up at all.

There's probably an overwhelming number of assertions one could add, but at least adding the ones for common issues would already save a LOT of time. If there's an issue filed, or an FAQ entry, those are clearly good candidates.

When transparent assertions aren't possible...

For https://github.com/terraform-aws-modules/terraform-aws-eks/issues/1972, it's more challenging to implement the assertion because you have to check across resources. module.eks.cluster_ip_family needs to correspond with module.vpc_cni_irsa.vpc_cni_enable_ipv(4|6). I don't think there's any way to transparently add assertions to cases like this.

Option 1: change the module structure

Imagine a different approach to the IRSA module where the usage would look like:

module "irsa_vpc_cni" {
  source = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks2/vpc-cni"
  eks = module.eks
}

Passing the full module.eks in would then allow running the necessary checks.

Side note

This approach would also avoid users having to do any of this stuff

  attach_vpc_cni_policy = true
  vpc_cni_enable_ipv6   = true

  oidc_providers = {
    main = {
      provider_arn               = module.eks.oidc_provider_arn
      namespace_service_accounts = ["kube-system:aws-node"]
    }
  }

because this would already be baked in to the VPC CNI specific IRSA module (and similarly for other cases where there currently exists an attach_*_policy boolean.

Option 2: separate assert-only modules

I imagine there would be some objections to passing the entire module.eks. I can see good arguments on both sides of that battle. If it's not desired, then the current module structure could be maintained, but also add new assert-only modules that might look like:

module "irsa_vpc_cni_check" {
  source = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks/vpc-cni-check"
  eks = module.eks
  irsa = module.irsa_vpc_cni
}

(a very rough sketch of an idea, I'm not saying to design it to look exactly like this).

Describe alternatives you've considered.

Completely rework the module design to follow the "make it hard to use wrong" principle so that assertions would be less necessary. The Option 1 above goes in this direction but it could be applied in a broader scope across all the modules. (That's a much bigger discussion though, it could help a lot with this problem, but it probably doesn't make sense to discuss here).

Additional context

none

Dec 28 '23 20:12 brianbraunstein

We would never create an argument that relies on an entire module output - that would be a very poor, fragile, and restrictive choice. I also don't agree with the assumption that assertions fix the issues linked. We cannot do anything within a module that alerts users to a missing provider implementation (first issue referenced). I do not see the validity nor the feasibility of this request

Dec 28 '23 21:12 bryantbiggs

I figured there might be resistance to passing entire module outputs around, which is why I included "Option 2" above. It still passes the module outputs around but doesn't do so in the main parts of the code, only these new and optional assertion submodules. If the modules are released in the same git repo and required to be at the same version, then I think that resolves the fragility issue, and the synced version restriction would only apply to the optional assertion submodules, so useful when that's possible, and no different when it's not possible.

That said, I'm mainly trying to identify an issue and start some brainstorm on solutions. I'm not pushing for any specific solution, just coming with some ideas to kick start the brainstorming process, sort of in the spirit of "come with solutions not problems", but I'm very open to other solutions that solve the problem. I also recognize it's probably more of a long term thing that requires some thought rather than a quick bug fix. If this isn't the right forum for that please let me know a better place. I'm happy to get a on voice or video call too if it helps.

Do you know of any precedent for requiring version syncing of co-released submodules? It seems like a handy tool that could help libraries written in Terraform to solve problems with a more user friendly API. Modules are roughly equivalent to functions in other languages, with complex input and output structs. In other languages if you produce a library that has functions and complex structs, you wouldn't worry about passing pointers to those complex structs between different parts of the API.

Something like this is fine and essentially analogous to passing entire module outputs.

class Foo { ... };

Foo* NewFoo() { ...return a_foo; }

foo = NewFoo();
FlipAFoo(foo);

If you declare parts of the Foo as private (or an equivalent, perhaps just documented to be internal since Terraform doesn't actually have a concept of private) then it's actually potentially LESS fragile than a solution which gives the entire module output to the user. It would allow maintaining the same API that passes module outputs around while modifying what's actually being passed around internally, all transparent to API users.

Jan 02 '24 20:01 brianbraunstein

Do you know of any precedent for requiring version syncing of co-released submodules? It seems like a handy tool that could help libraries written in Terraform to solve problems with a more user friendly API. Modules are roughly equivalent to functions in other languages, with complex input and output structs. In other languages if you produce a library that has functions and complex structs, you wouldn't worry about passing pointers to those complex structs between different parts of the API.

Terraform is not like other software languages. You don't have abstractions - you can't put something behind an interface and have free will to change that. Nearly anything you do, its felt by users. If I change the shape of an input or output, theres nothing that can transform that or mask it, users experience that. Theres also nothing I can do to refactor a module without users feeling that (save for some of the newer features such as move blocks but those are still quite basic - you cannot use them on looping constructs).

I'm not sure what the issue is thats trying to be solved here. Said differently - how would you prevent "bad configurations" with things like Ansible configurations or Kubernetes manifests?

Jan 02 '24 21:01 bryantbiggs

how would you prevent "bad configurations" with things like Ansible configurations or Kubernetes manifests

Here's an example with kube manifests when using jsonnet:

https://github.com/kube-libsonnet/kube-libsonnet/blob/8a6f8eb32dc6247fc970f4b0ca3a9130b7ef4f79/kube.libsonnet#L361-L368

There are many asserts where the library is trying to protect you from known silly configurations to save time debugging (I just found this particular one kinda funny because obviously this person keeps doing the same bad thing even though they know better, assertions are also great for experts, especially when certain things tend to be foot guns)

Terraform is not like other software languages

So it sounds like with Terraform you just can't get rid of foot guns or even protect users from them with asserts. I thought perhaps the trick where you have co-released submodules and require the versions to be synced would be one approach, especially when it's a separate "assert only" helper submodule that doesn't mess up the core interface. It looks like you'd rather not go that route though. It also sounds like you think there's really no solution using Terraform. That's fine, you can go ahead an close this. I might play around a bit on my side and see what I can come up with...and then give up once I realize you're right about Terraform ;)

Thanks for the time and patience to go back and forth with me.

Jan 03 '24 18:01 brianbraunstein

thank you! closing this one out for now, we'll continue to evaluate the functionality that the Terraform core provides for future improvements

Jan 26 '24 21:01 bryantbiggs

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

Feb 26 '24 02:02 github-actions[bot]

terraform-aws-eks terraform-aws-eks copied to clipboard

Add assertions to help save lots of wasted time with known bad configurations.

Is your request related to a new offering from AWS?

Is your request related to a problem? Please describe.

Describe the solution you'd like.

Add assertions to catch obviously incorrect usage.

When transparent assertions aren't possible...

Option 1: change the module structure

Side note

Option 2: separate assert-only modules

Describe alternatives you've considered.

Additional context

terraform-aws-eks
terraform-aws-eks copied to clipboard