terraform-aws-eks-cluster
terraform-aws-eks-cluster copied to clipboard
Proper management of aws auth config map
what
- Use newer kubernetes_config_map_v1_data to force management of config map from a single place and have field ownership
- Implement self reference as described here: https://github.com/cloudposse/terraform-aws-eks-cluster/issues/155#issuecomment-1164689457 in order to detect if any iam roles were removed from terraform config
- Preserve any existing config map setting that was added outside terraform; basically terraform will manage only what is passed in variables
why
Mainly the reasons from #155.
references
*closes #155
Pulling in @Nuru since he's got strong opinions in this area of the code (and rightfully so 👍)
This pull request is now in conflict. Could you fix it @sebastianmacarescu? 🙏
@Nuru I have refactored this:
- removed kubernetes_config_map_v1_data
- remove the state data source - this failed on first terraform apply as the tf state didn't exist yet and the data source must reference existing stuff
- save the previous terraform input in AWS SSM Parameter and read it on apply to detect if any roles must be removed
This seems to work a lot better and is a lot cleaner.
As for this issue: https://github.com/hashicorp/terraform-provider-kubernetes/pull/1671#issuecomment-1086032868 "A https://github.com/aidanmelen/terraform-aws-eks-auth/issues/14 will not automatically create an aws-auth configmap." I didn't manage to replicate it, my data source successfully reads the aws config map. So I suspect AWS automatically creates one.
Are there plans to merge this?
@stevec-aztech This was completely on hold pending resolution of https://github.com/hashicorp/terraform-provider-aws/issues/25335
@sebastianmacarescu I really do appreciate the effort you have put into this. How much and what kind of testing have you done on it? Just looking at it makes me scared, particularly with the regular expressions, as those have a way of causing hard-to-find bugs and security vulnerabilities.
What size are the strings you have been storing in parameter store? I would like to know where the breaking point is, as some people have added a lot of roles to their auth map.
Plus I'm always worried about the destroy
behavior, as problems in destroying the cluster keep cropping up.
I really am looking forward to AWS giving us a proper API for the auth map and we could end all these workarounds.
So I have spent some time to realistically generate some big map_aditional_iam_roles for the config map as this is the only stuff I store in AWS Paramter Store.
The maximum size of the parameter value for Advanced Tier in AWS SSM Parameter Store is 8KB as can be seen HERE.
The current implementation allows for around 30 roles. But we can use base64gzip on the json data and we will get to around 60 roles. Plus we can get rid of those regexes and be a lot safer.
For who wants to play the python code to generate a random dataset (really long values for the iam role name, group names and usernames that most won't have) is:
import string
import random
import json
import gzip
import base64
from sys import getsizeof
def generate_role():
# Role name between 30 and 64 chars
account_id = ''.join(random.choices(string.digits, k=12))
role_name = ''.join(random.choices(string.ascii_lowercase, k=random.randint(30, 64)))
arn = f"arn:aws:iam::{account_id}/role/{role_name}"
return arn
def generate_user():
return ''.join(random.choices(string.ascii_lowercase, k=random.randint(10, 36)))
def generate_groups():
return random.choices([''.join(random.choices(string.ascii_lowercase, k=random.randint(4, 20))) for i in range(10)], k=random.randint(1, 8))
data = [
{
"groups": generate_groups(),
"rolearn": generate_role(),
"username": generate_user()
}
for i in range(60)
]
json_str = json.dumps(data)
print("\nInitial data\n---------------")
# print(f"Data {json_str}")
print(f"Number of chars {len(json_str)}")
print(f"Size in bytes {getsizeof(json_str)}")
# print(f"Number KB (from chars) {len(json_str) / 1000}")
compressed = gzip.compress(json_str.encode('utf-8'))
encoded = base64.b64encode(compressed).decode('utf-8')
print("\nGzip compression\n---------------")
# print(f"Gzip {compressed}")
print(f"Number of chars {len(compressed)}")
print(f"Size in bytes {getsizeof(compressed)}")
# Emulate https://www.terraform.io/language/functions/base64gzip
print("\nB64 encoding of gzip\n---------------")
# print(f"B64 encoded of gzip {encoded}")
print(f"Number of chars {len(encoded)}")
print(f"Size in bytes {getsizeof(encoded)}")
Some output:
Initial data
---------------
Number of chars 12881
Size in bytes 12930
Gzip compression
---------------
Number of chars 5772
Size in bytes 5805
B64 encoding of gzip
---------------
Number of chars 7696
Size in bytes 7745
Initial data
---------------
Number of chars 12515
Size in bytes 12564
Gzip compression
---------------
Number of chars 5651
Size in bytes 5684
B64 encoding of gzip
---------------
Number of chars 7536
Size in bytes 7585
Also I'm running the current code in production EKS cluster and works just fine as it is, with around 8 aditional iam roles in the config map.
If you think this can be merged I can simplify the code and use base64gzip to eliminate those hard to understand regexes, plus increasing the number of extra iam roles. Also I can add a note in the doumentation and possibly terraform parameter validation.
@sebastianmacarescu Thank you for this extra effort.
If you think this can be merged I can simplify the code and use base64gzip to eliminate those hard to understand regexes, plus increasing the number of extra iam roles. Also I can add a note in the doumentation and possibly terraform parameter validation.
We have other issues with using remote state triggering Terraform bugs, so yes, if you can use base64gzip
and simplify the code, I think I can work it in with some other stuff I'm doing this weekend. I may need to make some changes to make the SSM parameter an optional solution (allow people to keep using the current "remote state" solution). You certainly deserve credit and we can give you credit in the release notes, but if it turns out to be easier, would it be OK with you if I copy your work into a new PR I am working on and close this as superseded by that? If that is not OK, I totally understand, and it would not be much extra work for me, just checking on how you feel about it.
Also, we want to change the version pinning to be >= 3.38, !=4.18.0, !=4.19.0 ..., !=4.32.0
Hi @Nuru , I will update the code to remove the regexes together with the new version constraint. I would keep the work in this PR and separate release because to me it seems like a medium change that should not be mixed with other stuff features. Moreover it will be easier to understand the new changes when browsing the comments on the original issue here. But if it's extra hard for you to integrate in a separate release feel free to integrate in your new PR. Thank you :)
Hi @Nuru. Can you review the new changes?
- removed those regexes
- replaced parameter store with secrets manager due to bigger data size (8KB vs 64KB).
- updated readme with some more edge cases on delete failures together with links to terraform eks provider bugs
- rebased the work on the latest upstream release
This new implementation will allow us to store around 180 IAM roles specified in the terraform configuration. I have tested with both terraform create, update, add and remove iam roles from the config map and it works great.
Also there is no gunzip in terraform so storing the json string as gzip compressed base64 would not work: https://github.com/hashicorp/terraform/issues/22568
Guys, any progress on this one?
Since this went far on feedback and there is interest, friendly ping of @Nuru + @aknysh
final ping for @Nuru , @aknysh and @Gowiem
This pull request now has conflicts. Could you fix it @sebastianmacarescu? 🙏
This PR has been closed due to inactivity and merge conflicts. Please resolve the conflicts and reopen if necessary.
Thanks @sebastianmacarescu for creating this pull request!
A maintainer will review your changes shortly. Please don't be discouraged if it takes a while.
While you wait, make sure to review our contributor guidelines.
[!TIP]
Need help or want to ask for a PR review to be expedited?
Join us on Slack in the
#pr-reviews
channel.