terraform-aws-eks-cluster icon indicating copy to clipboard operation
terraform-aws-eks-cluster copied to clipboard

Proper management of aws auth config map

Open sebastianmacarescu opened this issue 2 years ago • 12 comments

what

  • Use newer kubernetes_config_map_v1_data to force management of config map from a single place and have field ownership
  • Implement self reference as described here: https://github.com/cloudposse/terraform-aws-eks-cluster/issues/155#issuecomment-1164689457 in order to detect if any iam roles were removed from terraform config
  • Preserve any existing config map setting that was added outside terraform; basically terraform will manage only what is passed in variables

why

Mainly the reasons from #155.

references

*closes #155

sebastianmacarescu avatar Jul 20 '22 12:07 sebastianmacarescu

Pulling in @Nuru since he's got strong opinions in this area of the code (and rightfully so 👍)

Gowiem avatar Jul 20 '22 18:07 Gowiem

This pull request is now in conflict. Could you fix it @sebastianmacarescu? 🙏

mergify[bot] avatar Jul 25 '22 04:07 mergify[bot]

@Nuru I have refactored this:

  • removed kubernetes_config_map_v1_data
  • remove the state data source - this failed on first terraform apply as the tf state didn't exist yet and the data source must reference existing stuff
  • save the previous terraform input in AWS SSM Parameter and read it on apply to detect if any roles must be removed

This seems to work a lot better and is a lot cleaner.

As for this issue: https://github.com/hashicorp/terraform-provider-kubernetes/pull/1671#issuecomment-1086032868 "A https://github.com/aidanmelen/terraform-aws-eks-auth/issues/14 will not automatically create an aws-auth configmap." I didn't manage to replicate it, my data source successfully reads the aws config map. So I suspect AWS automatically creates one.

sebastianmacarescu avatar Aug 26 '22 13:08 sebastianmacarescu

Are there plans to merge this?

stecullum avatar Oct 14 '22 10:10 stecullum

@stevec-aztech This was completely on hold pending resolution of https://github.com/hashicorp/terraform-provider-aws/issues/25335

@sebastianmacarescu I really do appreciate the effort you have put into this. How much and what kind of testing have you done on it? Just looking at it makes me scared, particularly with the regular expressions, as those have a way of causing hard-to-find bugs and security vulnerabilities.

What size are the strings you have been storing in parameter store? I would like to know where the breaking point is, as some people have added a lot of roles to their auth map.

Plus I'm always worried about the destroy behavior, as problems in destroying the cluster keep cropping up.

I really am looking forward to AWS giving us a proper API for the auth map and we could end all these workarounds.

Nuru avatar Oct 15 '22 07:10 Nuru

So I have spent some time to realistically generate some big map_aditional_iam_roles for the config map as this is the only stuff I store in AWS Paramter Store.

The maximum size of the parameter value for Advanced Tier in AWS SSM Parameter Store is 8KB as can be seen HERE.

The current implementation allows for around 30 roles. But we can use base64gzip on the json data and we will get to around 60 roles. Plus we can get rid of those regexes and be a lot safer.

For who wants to play the python code to generate a random dataset (really long values for the iam role name, group names and usernames that most won't have) is:

import string
import random
import json
import gzip
import base64
from sys import getsizeof

def generate_role():
	# Role name between 30 and 64 chars
	account_id = ''.join(random.choices(string.digits, k=12))
	role_name = ''.join(random.choices(string.ascii_lowercase, k=random.randint(30, 64)))
	arn = f"arn:aws:iam::{account_id}/role/{role_name}"
	return arn

def generate_user():
	return ''.join(random.choices(string.ascii_lowercase, k=random.randint(10, 36)))

def generate_groups():
	return random.choices([''.join(random.choices(string.ascii_lowercase, k=random.randint(4, 20))) for i in range(10)], k=random.randint(1, 8))

data = [
	{
		"groups": generate_groups(),
		"rolearn": generate_role(),
		"username": generate_user()
	}
	for i in range(60)
]
json_str = json.dumps(data)

print("\nInitial data\n---------------")
# print(f"Data {json_str}")
print(f"Number of chars {len(json_str)}")
print(f"Size in bytes 	{getsizeof(json_str)}")
# print(f"Number KB (from chars) {len(json_str) / 1000}")

compressed = gzip.compress(json_str.encode('utf-8'))
encoded = base64.b64encode(compressed).decode('utf-8')

print("\nGzip compression\n---------------")
# print(f"Gzip {compressed}")
print(f"Number of chars {len(compressed)}")
print(f"Size in bytes 	{getsizeof(compressed)}")

# Emulate https://www.terraform.io/language/functions/base64gzip
print("\nB64 encoding of gzip\n---------------")
# print(f"B64 encoded of gzip {encoded}")
print(f"Number of chars {len(encoded)}")
print(f"Size in bytes 	{getsizeof(encoded)}")

Some output:

Initial data
---------------
Number of chars 12881
Size in bytes 	12930

Gzip compression
---------------
Number of chars 5772
Size in bytes 	5805

B64 encoding of gzip
---------------
Number of chars 7696
Size in bytes 	7745
Initial data
---------------
Number of chars 12515
Size in bytes 	12564

Gzip compression
---------------
Number of chars 5651
Size in bytes 	5684

B64 encoding of gzip
---------------
Number of chars 7536
Size in bytes 	7585

Also I'm running the current code in production EKS cluster and works just fine as it is, with around 8 aditional iam roles in the config map.

If you think this can be merged I can simplify the code and use base64gzip to eliminate those hard to understand regexes, plus increasing the number of extra iam roles. Also I can add a note in the doumentation and possibly terraform parameter validation.

sebastianmacarescu avatar Oct 15 '22 09:10 sebastianmacarescu

@sebastianmacarescu Thank you for this extra effort.

If you think this can be merged I can simplify the code and use base64gzip to eliminate those hard to understand regexes, plus increasing the number of extra iam roles. Also I can add a note in the doumentation and possibly terraform parameter validation.

We have other issues with using remote state triggering Terraform bugs, so yes, if you can use base64gzip and simplify the code, I think I can work it in with some other stuff I'm doing this weekend. I may need to make some changes to make the SSM parameter an optional solution (allow people to keep using the current "remote state" solution). You certainly deserve credit and we can give you credit in the release notes, but if it turns out to be easier, would it be OK with you if I copy your work into a new PR I am working on and close this as superseded by that? If that is not OK, I totally understand, and it would not be much extra work for me, just checking on how you feel about it.

Also, we want to change the version pinning to be >= 3.38, !=4.18.0, !=4.19.0 ..., !=4.32.0

Nuru avatar Oct 15 '22 20:10 Nuru

Hi @Nuru , I will update the code to remove the regexes together with the new version constraint. I would keep the work in this PR and separate release because to me it seems like a medium change that should not be mixed with other stuff features. Moreover it will be easier to understand the new changes when browsing the comments on the original issue here. But if it's extra hard for you to integrate in a separate release feel free to integrate in your new PR. Thank you :)

sebastianmacarescu avatar Oct 19 '22 12:10 sebastianmacarescu

Hi @Nuru. Can you review the new changes?

  • removed those regexes
  • replaced parameter store with secrets manager due to bigger data size (8KB vs 64KB).
  • updated readme with some more edge cases on delete failures together with links to terraform eks provider bugs
  • rebased the work on the latest upstream release

This new implementation will allow us to store around 180 IAM roles specified in the terraform configuration. I have tested with both terraform create, update, add and remove iam roles from the config map and it works great.

Also there is no gunzip in terraform so storing the json string as gzip compressed base64 would not work: https://github.com/hashicorp/terraform/issues/22568

sebastianmacarescu avatar Nov 03 '22 14:11 sebastianmacarescu

Guys, any progress on this one?

salemgolemugoo avatar Dec 23 '22 20:12 salemgolemugoo

Since this went far on feedback and there is interest, friendly ping of @Nuru + @aknysh

Gowiem avatar Oct 19 '23 14:10 Gowiem

final ping for @Nuru , @aknysh and @Gowiem

hans-d avatar Mar 05 '24 00:03 hans-d

This pull request now has conflicts. Could you fix it @sebastianmacarescu? 🙏

mergify[bot] avatar Mar 11 '24 15:03 mergify[bot]

This PR has been closed due to inactivity and merge conflicts. Please resolve the conflicts and reopen if necessary.

mergify[bot] avatar Mar 11 '24 15:03 mergify[bot]

Thanks @sebastianmacarescu for creating this pull request!

A maintainer will review your changes shortly. Please don't be discouraged if it takes a while.

While you wait, make sure to review our contributor guidelines.

[!TIP]

Need help or want to ask for a PR review to be expedited?

Join us on Slack in the #pr-reviews channel.

mergify[bot] avatar Mar 11 '24 15:03 mergify[bot]