kiam icon indicating copy to clipboard operation
kiam copied to clipboard

Clarify EKS install instructions

Open zfLQ2qx2 opened this issue 5 years ago • 9 comments

I'm wanting to set up kiam on an AWS EKS cluster and I'm running into difficulties.

All of the kiam walk-throughs I've seen start with creating a kiam-server role with a trust relationship with the security group of the master nodes. With EKS the master nodes are hidden in an AWS controlled account so I can't do that.

The work around I've seen discussed seems to be to add a trust relationship with the kiam-server role to the worker nodes, however I believe that allows any process on the worker node to assume the kiam-server role, and then any role that it has a trust relationship with, so I don't think that is good advice.

There was also some discussion about running the kiam server and agent on the same servers, but there seems to be a conflict there with the iptables interception of traffic to the metadata service, so I don't think that is what I want to do either.

Last I've seen several vague discussions about having a dedicated server for the kiam-server process, however I'm not found an example of doing this, if someone has done this before and can give me steps to follow for a proof of concept I'd appreciate it.

My end goal is to use this with the alb ingress and external dns addons without giving every pod on the same workers the ability to manage these resources directly.

zfLQ2qx2 avatar Mar 07 '19 20:03 zfLQ2qx2

I've been waiting for https://github.com/uswitch/kiam/pull/112 to land before switching over from kube2iam, but that pull request has been languishing.

casret avatar Mar 08 '19 18:03 casret

@pingles Do you have any advice? I've spent days on this. First solution I came up with was to spin up a second set of workers to have the assume role policy - but seems like it would be a lot of work to keep others off of those nodes. The second idea I came up with was to allocate an IAM user so I didn't have to attach a policy to the worker nodes, but no indication that kiam can use one if provided (and I think someone with enough knowledge could find out what params I used which is the same as giving sts::assume_role to the instance role. Third idea I came up with was to spin up a lone server in the same AWS security group as the control plane and try to just compile and run the kiam server component there - can it work outside of kubernetes - looks like I can provide an server IP address in the deployment, but I don't know if kiam agent expects the address it to be local or not.

zfLQ2qx2 avatar Mar 12 '19 03:03 zfLQ2qx2

We don't personally use EKS so we're somewhat hesitant to give too much advice. It's a shame that PR has been stuck for so long. Generally the approach we would recommend when in a situation where you don't control the masters: is to spin up some dedicated node group that has the kiam server IAM permissions and is tainted so that only the kiam servers can run there.

Joseph-Irving avatar Mar 12 '19 11:03 Joseph-Irving

@Joseph-Irving That is def one way to do it, only concern there is that I don't think there is anything stopping someone who knows the taint name from running a process there and gaining all the abilities that kiam has. They can learn the name from our git repo if nothing else.

zfLQ2qx2 avatar Mar 12 '19 13:03 zfLQ2qx2

I assume you're referring to people in your company/customers tolerating that taint. I would kinda argue that's a more generic problem of you don't have a way to control how users of your cluster schedule their workloads. There is a long running issue on kube about having RBAC around tolerating taints, for now I'd recommend looking at admission webhooks to ensure that workloads are only tolerating what they're allowed to.

Joseph-Irving avatar Mar 19 '19 09:03 Joseph-Irving

We've written up a guide to setting up Kiam on EKS you can look at @zfLQ2qx2. It has a proof of concept set of templates and a script to provision everything in one go. That's an interesting point about admission webhooks @Joseph-Irving, we'll look at that next. As it stands one team controls all the deploys so it's not an issue yet.

We use the "two worker nodes" system, which seems to work well for us. https://bambooengineering.io/2019/06/14/kiam-on-eks-with-helm.html

hlascelles avatar Jun 16 '19 19:06 hlascelles

But why to waste node(s) just for kiam server. I am dealing with this as well, running kiam servers on same nodes (not masters) as agents. It kinda works, but when I reload many pods, kiam agent and kiam servers restarting - resulting in the fact the pod does not get the role at the time of start, resulting in malfunction.

Any advice?

MilanDasek avatar Jul 18 '19 08:07 MilanDasek

Agents and servers are separated to 1) ensure nodes running user workloads aren’t able to obtain IAM tokens for all possible roles, and 2) reduce load on the API server watches.

The medium article in the README has a bit more background on why it ended up that way.

It’s definitely better if you can operate it that way. I’m sorry the out of the box experience on EKS isn’t great but we don’t use it yet so haven’t been able to contribute much directly.

On Thu, 18 Jul 2019 at 09:49, Milan Dasek [email protected] wrote:

But why to waste node(s) just for kiam server. I am dealing with this as well, running kiam servers on same nodes (not masters) as agents. It kinda works, but when I reload many pods, kiam agent and kiam servers restarting

  • resulting in the fact the pod does not get the role at the time of start, resulting in malfunction.

Any advice?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/uswitch/kiam/issues/228?email_source=notifications&email_token=AAAAI7W5C74GIKYF736UO4LQAAVBDA5CNFSM4G4QJTNKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2HZHPI#issuecomment-512725949, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAAI7XYR4DMHKMFB4YTC7DQAAVBDANCNFSM4G4QJTNA .

pingles avatar Jul 18 '19 08:07 pingles

other than possibility to get tokens for all possible roles, is there any other problem?

currently my kiam server and agent are restarting when I restart 10 user pods.

MilanDasek avatar Jul 18 '19 09:07 MilanDasek