kubeadm icon indicating copy to clipboard operation
kubeadm copied to clipboard

run control-plane as non-root

Open neolit123 opened this issue 5 years ago • 41 comments

KEP https://github.com/kubernetes/enhancements/tree/master/keps/sig-cluster-lifecycle/kubeadm/2568-kubeadm-non-root-control-plane k/e issue: https://github.com/kubernetes/enhancements/issues/2568

This KEP proposes that the control-plane in kubeadm be run as non-root. If containers are running as root an escape from a container may result in the escalation to root in host. CVE-2019-5736 is an example of a container escape vulnerability that can be mitigated by running containers/pods as non-root.

kubeadm feature gate is called RootlessControlPlane

ALPHA 1.22:

  • [x] code changes:
    • [x] seccomp = runtime/default: https://github.com/kubernetes/kubernetes/pull/100234
    • [x] add feature gate: https://github.com/kubernetes/kubernetes/pull/102158
    • [x] add utils / constants: https://github.com/kubernetes/kubernetes/pull/102195 https://github.com/kubernetes/kubernetes/pull/102463 https://github.com/kubernetes/kubernetes/pull/102494 https://github.com/kubernetes/kubernetes/pull/102604
    • [x] (on the side) pipe dry-run option to static pod manifest utils: https://github.com/kubernetes/kubernetes/pull/102722
    • [x] run CP components as non-root: https://github.com/kubernetes/kubernetes/pull/102759
    • [x] run etcd as non-root: https://github.com/kubernetes/kubernetes/pull/102862
    • [x] fix bug in "download-certs" and permissions: https://github.com/kubernetes/kubernetes/pull/103313
    • [x] https://github.com/kubernetes/kubernetes/pull/103380
    • [x] https://github.com/kubernetes/kubernetes/pull/101988
  • [x] e2e tests https://github.com/kubernetes/kubeadm/pull/2511 https://github.com/kubernetes/test-infra/pull/22676 https://github.com/kubernetes/kubeadm/pull/2520 https://github.com/kubernetes/kubeadm/pull/2521 https://github.com/kubernetes/kubeadm/pull/2522

on hold until further notice. we are waiting the user namespaces KEP to go GA:

  • https://github.com/kubernetes/enhancements/issues/127

BETA x.yy:

  • [ ] update KEP to tag Beta
  • [ ] start tracking the KEP in release spreadsheets (task for kubeadm leads)
  • [ ] make code changes in kubeadm (enable FG by default, test if upgrades work as is?)
  • [ ] update docs pages:
    • https://kubernetes.io/docs/reference/setup-tools/kubeadm/implementation-details/ would need to include some details on how the feature works under the hood - e.g. the user/group add/remove magic.
    • https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init should include a section about the new feature gate. and how to turn it off. we need to remove the section when the FG is GA.
  • [ ] update e2e tests? https://github.com/kubernetes/kubeadm/blob/main/kinder/ci/tools/update-workflows/templates/workflows/rootless-tasks.yaml we need to ensure that we have test coverage for upgrading from "FG off" -> "FG on" by default and then checking if CP is rootless.

neolit123 avatar May 10 '21 15:05 neolit123

/assign vinayakankugoyal

neolit123 avatar May 10 '21 15:05 neolit123

cc @vinayakankugoyal

neolit123 avatar May 10 '21 15:05 neolit123

/assign vinayakankugoyal

I don't think this can be assigned to me because I am not a kubernetes org member. But to anyone following this bug, I will be working on it.

vinayakankugoyal avatar May 10 '21 15:05 vinayakankugoyal

Can we update the add feature gate: link in the description above to https://github.com/kubernetes/kubernetes/pull/102158

vinayakankugoyal avatar May 20 '21 19:05 vinayakankugoyal

@vinayakankugoyal https://kubernetes.io/blog/2019/01/15/container-storage-interface-ga/#how-to-use-a-csi-volume if the rootless kubeadm apiserver eventually becomes ON by default, would it break CSI driver users?

neolit123 avatar Jun 02 '21 17:06 neolit123

@vinayakankugoyal https://kubernetes.io/blog/2019/01/15/container-storage-interface-ga/#how-to-use-a-csi-volume if the rootless kubeadm apiserver eventually becomes ON by default, would it break CSI driver users?

no because it is not the kube-apiserver that needs to run as privileged pod, it is the csi driver that needs to run as privileged pod. --allow-privileged=true allows privileged containers it does not make kube-apiserver's container privileged. (Same for kubelet but that is anyways out of scope of this KEP.)

vinayakankugoyal avatar Jun 02 '21 19:06 vinayakankugoyal

/assign vinayakankugoyal

vinayakankugoyal avatar Jun 04 '21 11:06 vinayakankugoyal

Can we update e2e section above with PR: https://github.com/kubernetes/kubeadm/pull/2511

vinayakankugoyal avatar Jun 22 '21 19:06 vinayakankugoyal

I am doing some investigations and testing on this alpha feature recently. Is there anything that should be done in 1.23? @neolit123 @vinayakankugoyal

pacoxu avatar Sep 22 '21 03:09 pacoxu

I am doing some investigations and testing on this alpha feature recently.

let us know if you find any bugs. my biggest concern is around supporting linux distros that are non-standard in terms of system files.

Is there anything that should be done in 1.23?

the KEP was not updated for 1.23, with the premise to give the alpha one more release for users to test it. i didn't see anyone object to this plan.

neolit123 avatar Sep 22 '21 08:09 neolit123

It works well in my basic testing and will keep running such a non-root env to see if there is an issue.

pacoxu avatar Sep 26 '21 06:09 pacoxu

hey @vinayakankugoyal i saw you PR to turn on the FG by default: https://github.com/kubernetes/kubernetes/pull/106869

in the issue description here i've enumerated the steps for this to graduate to beta in 1.24. would you be able to work on these tasks in the next 4 months?

also note that starting next week i will be on PTO until early Jan 2022, so not sure how much i can review until then.

neolit123 avatar Dec 08 '21 16:12 neolit123

Hi @neolit123 (long time 😄 ). Thanks for updating the bug with the beta graduation work. Ill we able to work on these tasks in the next 4 months.

vinayakankugoyal avatar Dec 08 '21 17:12 vinayakankugoyal

as noted earlier, there seem to be some activity on supporting user namespaces for pods in core k8s: https://github.com/kubernetes/enhancements/pull/3065 https://github.com/kubernetes/enhancements/pull/2101 (not sure which one is the KEP PR to watch, possibly the newer one).

The goal of supporting user namespaces in Kubernetes is to be able to run processes in pods with a different user and group IDs than in the host. Specifically, a privileged process in the pod runs as an unprivileged process in the host. If such a process is able to break out of the container to the host, it'll have limited impact as it'll be running as an unprivileged user there.

@vinayakankugoyal what is your evaluation of the user namespaces KEP? do you see it as something that has end-goal overlap with the kubeadm RootlessControlPlane FG? you've mentioned that it would not support hostPath mounts for the Alpha. anything else to note about it?

user namespace support is a much desired change in k8s, and i consider what we have in kubeadm a bit of a hack that may bite us due to distro specific drift - we manually manage the user/groups to simplify the UX and so that users that want to not run the CP as root can get it automatically. possibly not a big issue, since distros seem standard WRT the system files for users/groups.

i think we need to evaluate whether we want to put a hold on the kubeadm feature moving to beta and instead waiting on the username spaces feature to go Beta, at which point we can start using it and set the right fields in the Pod spec and potentially remove the kubeadm feature.

but....the user namespaces KEP is still in review and there are some pending concerns and a lot of discussion there. as we discussed with @fabriziopandini in today's kubeadm meeting, we would have to evaluate if that KEP is not going to move forward in time. if it moves forward nicely we might want to start using it at some point. in the meantime users can use the kubeadm alpha feature. if it does not move forward in time, we are going to graduate the kubeadm feature.

if we move the kubeadm feature to Beta and eventually plan to remove it in favor of user namespaces, this is doable but means we are opting-in everyone and we have to maintain the feature for the Beta deprecation (e.g. 1 year).

neolit123 avatar Jan 05 '22 18:01 neolit123

the user namespaces KEP is still in review and there are some pending concerns and a lot of discussions there. as we discussed with @fabriziopandini in today's kubeadm meeting

IMO, there are some overlaps between RootlessControlPlane in kubeadm and user namespace support in kubelet. Just two ways to make it. However, there are no conflicts.

As RootlessControlPlane is alpha, could we re-design it to use user namespace if possible? It means that kubeadm can fall back to the current solution if the UserNamespace is not enabled on the master node.

We can promote RootlessControlPlane to beta in 1.24 as this is a good-enough solution in my mind. If RootlessControlPlane keeps being alpha, it will benefit fewer users as it is not by default behavior.

  • If UserNamespaceRemapping will be alpha in 1.24, we can support migration to the user namespace solution. kubeadm init will check UserNamespaceRemapping feature gate at first. kubeadm can fall back to the current solution if the UserNamespace is not enabled on the master node.
  • If UserNamespaceRemapping is not ready in 1.24, we can support migration to the user namespace solution once it is ready. RootlessControlPlane will keep being Beta.
  • Meanwhile, we have to remove the feature gate or propone the graduation of kubeadm FG RootlessControlPlane util user namespace is beta or GA(Then we can remove RootlessControlPlane) as this is something by default.

pacoxu avatar Feb 17 '22 08:02 pacoxu

As RootlessControlPlane is alpha, could we re-design it to use user namespace if possible?

We can talk more about that. I have not seen similar FG redesigns in k8s, but sounds doable if we redesign the alpha. If our FG is already beta a redesign contradicts with the beta definition, at least in my book.

We can promote RootlessControlPlane to beta in 1.24 as this is a good-enough solution in my mind. If RootlessControlPlane keeps being alpha, it will benefit fewer users as it is not by default behavior.

The main problem with promotion of FGs that we are not sure about to beta, is that users start enabling them in production even if the FG is off by default. Then they start binding their infra to the implementation in weird ways. E.g. reusing the kubeadm managed uid/gids. Beta also means - the underlying design is stable and ready to mature.

I think if we are not sure about something, it is wiser to just wait...maybe one more release. I like where the conversations are going in the user namespace kep and i think they will solve the volume problems as well at some point. If we can, i think we should help drive that kep with what we can - pr reviews etc.

neolit123 avatar Feb 17 '22 13:02 neolit123

Then they start binding their infra to the implementation in weird ways. E.g. reusing the kubeadm managed uid/gids. Beta also means - the underlying design is stable and ready to mature.

It seems that RootlessControlPlane is not a mature way. I walk through the KEP RootlessControlPlane and the risk https://github.com/kubernetes/enhancements/tree/master/keps/sig-cluster-lifecycle/kubeadm/2568-kubeadm-non-root-control-plane#risks-and-mitigations is still there.

If we hard coded the UID and GID, we could end up in a scenario where those are in use by another process on the machine, which would expose some of the credentials accessible to the UID and GIDs to that process. So we plan to use adduser --system or using the appropriate ranges from /etc/login.defs instead of hard coding the UID and GID.

If this is not a mature way, it should keep alpha and be removed once UserNamespace(a better solution? right?) is out.

All my suggestions in my last comments are based on that either is a good-enough solution for non-root on nodes. If not, the redesign is not acceptable.

pacoxu avatar Feb 28 '22 08:02 pacoxu

https://github.com/kubernetes/enhancements/pull/3065 is merged. 👍

pacoxu avatar Mar 09 '22 09:03 pacoxu

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jun 17 '22 03:06 k8s-triage-robot

/lifecycle frozen

neolit123 avatar Jun 17 '22 06:06 neolit123

our e2e for this feature started failing yesterday. i have no explanation for the time being. but i don't think it's a kubeadm problem, so maybe something in core? https://github.com/kubernetes/kubeadm/issues/2750

neolit123 avatar Aug 26 '22 08:08 neolit123

our e2e for this feature started failing yesterday. i have no explanation for the time being. but i don't think it's a kubeadm problem, so maybe something in core? #2750

Yes

https://github.com/kubernetes/kubernetes/pull/113548(merged) may fix it. (a revert of https://github.com/kubernetes/kubernetes/pull/113408 that was merged hours before that. )

pacoxu avatar Nov 03 '22 09:11 pacoxu

it looks like the job has been green for a while, so maybe something else fixed it. the failures were in late august. i completely forgot about this..

https://k8s-testgrid.appspot.com/sig-cluster-lifecycle-kubeadm#kubeadm-kinder-rootless-latest

neolit123 avatar Nov 03 '22 10:11 neolit123

I opened the test grid(You post months ago) and find it failed yesterday(😓).

kubernetes/kubernetes#113548 may fix it. (a revert of kubernetes/kubernetes#113408 that was merged hours before that. )

Yesterday's failure is caused by that. Not failures in August.😄

pacoxu avatar Nov 03 '22 10:11 pacoxu

Is this actually important-longterm? It's been a few years.

sftim avatar Aug 28 '23 20:08 sftim

/remove-priority important-soon

Is this actually important-longterm? It's been a few years.

This feature is an alternative way for the user namespace feature. As we prefer to use the user namespace to gain the security control plane in the future, we decided to not promote this one to beta. But we should keep this FG until user namespace https://github.com/kubernetes/enhancements/issues/127 is beta.

pacoxu avatar Aug 31 '23 05:08 pacoxu

https://github.com/kubernetes/enhancements/issues/127 User Namespace is beta in v1.30. We may start the deprecation of RootlessControlPlane in v1.31.

pacoxu avatar Mar 26 '24 08:03 pacoxu