containers-roadmap icon indicating copy to clipboard operation
containers-roadmap copied to clipboard

[EKS] [request]: associating and disassociating the identity provider is painfully slow

Open bseenu opened this issue 4 years ago • 10 comments

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request

aws_eks_identity_provider_config.eks_identity_provider_config: Still creating... [23m20s elapsed]
aws_eks_identity_provider_config.eks_identity_provider_config: Still creating... [23m30s elapsed]
aws_eks_identity_provider_config.eks_identity_provider_config: Still creating... [23m40s elapsed]
aws_eks_identity_provider_config.eks_identity_provider_config: Still creating... [23m50s elapsed]
aws_eks_identity_provider_config.eks_identity_provider_config: Still creating... [24m0s elapsed]
aws_eks_identity_provider_config.eks_identity_provider_config: Still creating... [24m10s elapsed]
aws_eks_identity_provider_config.eks_identity_provider_config: Still creating... [24m20s elapsed]
aws_eks_identity_provider_config.eks_identity_provider_config: Still creating... [24m30s elapsed]
aws_eks_identity_provider_config.eks_identity_provider_config: Still creating... [24m40s elapsed]
aws_eks_identity_provider_config.eks_identity_provider_config: Still creating... [24m50s elapsed]
╷
│ Error: error waiting for EKS Identity Provider Config (sboga-test2:azure) association: context deadline exceeded
│
│   with aws_eks_identity_provider_config.eks_identity_provider_config,
│   on oidc.tf line 1, in resource "aws_eks_identity_provider_config" "eks_identity_provider_config":
│    1: resource "aws_eks_identity_provider_config" "eks_identity_provider_config" {
│
╵

I believe this is because of the need to restart api server, can we optimize this and make it bit better

Which service(s) is this request for? EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? improve the overall time to bring up and tear down the eks cluster

bseenu avatar Jul 15 '21 10:07 bseenu

we associate the OIDC via crossplane with compositions - problem here is that the whole time (25-60 min) when OIDC is configured the controlplane is in status updating - an additional status code for oidc configuration would be helpful

haarchri avatar Dec 09 '21 20:12 haarchri

We use identity provider as part of our standard build for EKS clusters. The problem here is - aside from the annoyance of adding ~1 hour to cluster build time, it makes things such as spin up / down cluster as part of automated tests so long that it's impractical.

orangenagy avatar Jan 06 '22 11:01 orangenagy

+1 what @orangenagy said.

timblaktu avatar Mar 30 '22 00:03 timblaktu

@timblaktu I get the following error when dissociating EKS Identity Provider.

│ Error: error disassociating EKS Identity Provider Config (devops-wheel-prototype-sam:okta): InvalidRequestException: Provided subnets subnet-0b0b5c97e8ded855b Free IPs: 0 , need at least 5 IPs in each subnet to be free for this operation
│ {
│   RespMetadata: {
│     StatusCode: 400,
│     RequestID: "8afd2264-1c90-42be-a093-bebda5d2e390"
│   },
│   Message_: "Provided subnets subnet-0b0b5c97e8ded855b Free IPs: 0 , need at least 5 IPs in each subnet to be free for this operation"
│ }
│
│
╵
╷
│ Error: context deadline exceeded
│
│

What can be done in this case? We got a situation where the IPs have been gobbled up earlier today?

sam19111 avatar May 04 '22 19:05 sam19111

It took me ~32 minutes to destroy it.

nitrocode avatar Jun 17 '22 19:06 nitrocode

Could there be a way to associate during the initial cluster build out? It seems like that would reduce the overall time instead of a separate operation post cluster creation.

maximmold avatar Jun 18 '22 04:06 maximmold

This is still a big problem. Taking ~30 mins each way.

tracetechnical avatar Jul 07 '22 21:07 tracetechnical

Same issue for me. Taking ~30 mins each way.

zensqlmonitor avatar Aug 04 '22 07:08 zensqlmonitor

Same issue here. Taking around 30 minutes each way and is terribly unproductive.

mrdanielmh avatar Sep 06 '22 15:09 mrdanielmh

All, what's the latest understanding of root cause for this long-running process? Has anyone obtained any insight here? I'm considering using another EKS-OIDC-RBAC solution in another Enterprise environment, and wanting to know what to expect in terms of bloated deployment time. I see from the recent comments and Open status that it's still a major problem..

Has anyone looked in a verbose CloudTrail for breadcrumbs indicating what's actually going on under the hood when the eks_identity_provider_config is Still creating...?

Since the configuration of each "client application" at the OIDC provider-side typically includes filters that limit the users and groups that the application cares about, one would think that there is not very much data that has to be transferred in this transaction between OIDC Provider and EKS control plane. If CloudTrail doesn't shine any light, perhaps we need a refresher on the OIDC Specs and Guides (or in my case, to learn it for the first time..).

@haarchri any updates on how this works/behaves using your Crossplane composition? This is one of the IaC approaches I'm now taking for EKS clusters, and am just wondering what I can learn from the trailblazers before me. ;-) What have been the practical implications of the K8s control plane staying in an updating... state for so long? What sort of system/health check timeouts did you have to extend, and is the presumably-hacked workaround solution actually working well enough, functionally, aside from having to go get a coffee (and lunch) every time you provision a new cluster?

timblaktu avatar Sep 08 '22 22:09 timblaktu

We are continually working to improve speed of updates but going to close this issue as associating/disassociating identity providers is now under 10 minutes per cluster. You can more find more details here https://aws.amazon.com/blogs/containers/amazon-eks-control-plane-auto-scaling-enhancements-improve-speed-by-4x/.

mikestef9 avatar Feb 09 '23 23:02 mikestef9

Under 10m is still unacceptable. No wonder people go to Azure for a competent K8S offering

tracetechnical avatar Feb 10 '23 09:02 tracetechnical

Hey AWS, looks like your API currently doesn't allow creating an EKS cluster with OIDC configuration already in it. This delay seems to be caused by the associate-iam-oidc-provider de-provisioning and re-provisioning the control plane.

The usability issue would be solved for 95+% usecases if you'd extend the create-cluster call and allow setting up OIDC from the get-go.

kdomanski avatar Feb 13 '23 10:02 kdomanski

Furthermore, only a single OIDC configuration is supported currently. 🤦🏼

n3ph avatar Aug 22 '23 12:08 n3ph