run control-plane as non-root
KEP https://github.com/kubernetes/enhancements/tree/master/keps/sig-cluster-lifecycle/kubeadm/2568-kubeadm-non-root-control-plane k/e issue: https://github.com/kubernetes/enhancements/issues/2568
This KEP proposes that the control-plane in kubeadm be run as non-root. If containers are running as root an escape from a container may result in the escalation to root in host. CVE-2019-5736 is an example of a container escape vulnerability that can be mitigated by running containers/pods as non-root.
kubeadm feature gate is called
RootlessControlPlane
ALPHA 1.22:
- [x] code changes:
- [x] seccomp = runtime/default: https://github.com/kubernetes/kubernetes/pull/100234
- [x] add feature gate: https://github.com/kubernetes/kubernetes/pull/102158
- [x] add utils / constants: https://github.com/kubernetes/kubernetes/pull/102195 https://github.com/kubernetes/kubernetes/pull/102463 https://github.com/kubernetes/kubernetes/pull/102494 https://github.com/kubernetes/kubernetes/pull/102604
- [x] (on the side) pipe dry-run option to static pod manifest utils: https://github.com/kubernetes/kubernetes/pull/102722
- [x] run CP components as non-root: https://github.com/kubernetes/kubernetes/pull/102759
- [x] run etcd as non-root: https://github.com/kubernetes/kubernetes/pull/102862
- [x] fix bug in "download-certs" and permissions: https://github.com/kubernetes/kubernetes/pull/103313
- [x] https://github.com/kubernetes/kubernetes/pull/103380
- [x] https://github.com/kubernetes/kubernetes/pull/101988
- [x] e2e tests https://github.com/kubernetes/kubeadm/pull/2511 https://github.com/kubernetes/test-infra/pull/22676 https://github.com/kubernetes/kubeadm/pull/2520 https://github.com/kubernetes/kubeadm/pull/2521 https://github.com/kubernetes/kubeadm/pull/2522
on hold until further notice. we are waiting the user namespaces KEP to go GA:
- https://github.com/kubernetes/enhancements/issues/127
BETA x.yy:
- [ ] update KEP to tag Beta
- [ ] start tracking the KEP in release spreadsheets (task for kubeadm leads)
- [ ] make code changes in kubeadm (enable FG by default, test if upgrades work as is?)
- [ ] update docs pages:
- https://kubernetes.io/docs/reference/setup-tools/kubeadm/implementation-details/ would need to include some details on how the feature works under the hood - e.g. the user/group add/remove magic.
- https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init should include a section about the new feature gate. and how to turn it off. we need to remove the section when the FG is GA.
- [ ] update e2e tests? https://github.com/kubernetes/kubeadm/blob/main/kinder/ci/tools/update-workflows/templates/workflows/rootless-tasks.yaml we need to ensure that we have test coverage for upgrading from "FG off" -> "FG on" by default and then checking if CP is rootless.
/assign vinayakankugoyal
cc @vinayakankugoyal
/assign vinayakankugoyal
I don't think this can be assigned to me because I am not a kubernetes org member. But to anyone following this bug, I will be working on it.
Can we update the add feature gate: link in the description above to https://github.com/kubernetes/kubernetes/pull/102158
@vinayakankugoyal https://kubernetes.io/blog/2019/01/15/container-storage-interface-ga/#how-to-use-a-csi-volume if the rootless kubeadm apiserver eventually becomes ON by default, would it break CSI driver users?
@vinayakankugoyal https://kubernetes.io/blog/2019/01/15/container-storage-interface-ga/#how-to-use-a-csi-volume if the rootless kubeadm apiserver eventually becomes ON by default, would it break CSI driver users?
no because it is not the kube-apiserver that needs to run as privileged pod, it is the csi driver that needs to run as privileged pod. --allow-privileged=true allows privileged containers it does not make kube-apiserver's container privileged. (Same for kubelet but that is anyways out of scope of this KEP.)
/assign vinayakankugoyal
Can we update e2e section above with PR: https://github.com/kubernetes/kubeadm/pull/2511
I am doing some investigations and testing on this alpha feature recently. Is there anything that should be done in 1.23? @neolit123 @vinayakankugoyal
I am doing some investigations and testing on this alpha feature recently.
let us know if you find any bugs. my biggest concern is around supporting linux distros that are non-standard in terms of system files.
Is there anything that should be done in 1.23?
the KEP was not updated for 1.23, with the premise to give the alpha one more release for users to test it. i didn't see anyone object to this plan.
It works well in my basic testing and will keep running such a non-root env to see if there is an issue.
hey @vinayakankugoyal i saw you PR to turn on the FG by default: https://github.com/kubernetes/kubernetes/pull/106869
in the issue description here i've enumerated the steps for this to graduate to beta in 1.24. would you be able to work on these tasks in the next 4 months?
also note that starting next week i will be on PTO until early Jan 2022, so not sure how much i can review until then.
Hi @neolit123 (long time 😄 ). Thanks for updating the bug with the beta graduation work. Ill we able to work on these tasks in the next 4 months.
as noted earlier, there seem to be some activity on supporting user namespaces for pods in core k8s: https://github.com/kubernetes/enhancements/pull/3065 https://github.com/kubernetes/enhancements/pull/2101 (not sure which one is the KEP PR to watch, possibly the newer one).
The goal of supporting user namespaces in Kubernetes is to be able to run processes in pods with a different user and group IDs than in the host. Specifically, a privileged process in the pod runs as an unprivileged process in the host. If such a process is able to break out of the container to the host, it'll have limited impact as it'll be running as an unprivileged user there.
@vinayakankugoyal what is your evaluation of the user namespaces KEP? do you see it as something that has end-goal overlap with the kubeadm RootlessControlPlane FG? you've mentioned that it would not support hostPath mounts for the Alpha. anything else to note about it?
user namespace support is a much desired change in k8s, and i consider what we have in kubeadm a bit of a hack that may bite us due to distro specific drift - we manually manage the user/groups to simplify the UX and so that users that want to not run the CP as root can get it automatically. possibly not a big issue, since distros seem standard WRT the system files for users/groups.
i think we need to evaluate whether we want to put a hold on the kubeadm feature moving to beta and instead waiting on the username spaces feature to go Beta, at which point we can start using it and set the right fields in the Pod spec and potentially remove the kubeadm feature.
but....the user namespaces KEP is still in review and there are some pending concerns and a lot of discussion there. as we discussed with @fabriziopandini in today's kubeadm meeting, we would have to evaluate if that KEP is not going to move forward in time. if it moves forward nicely we might want to start using it at some point. in the meantime users can use the kubeadm alpha feature. if it does not move forward in time, we are going to graduate the kubeadm feature.
if we move the kubeadm feature to Beta and eventually plan to remove it in favor of user namespaces, this is doable but means we are opting-in everyone and we have to maintain the feature for the Beta deprecation (e.g. 1 year).
the user namespaces KEP is still in review and there are some pending concerns and a lot of discussions there. as we discussed with @fabriziopandini in today's kubeadm meeting
IMO, there are some overlaps between RootlessControlPlane in kubeadm and user namespace support in kubelet. Just two ways to make it. However, there are no conflicts.
As RootlessControlPlane is alpha, could we re-design it to use user namespace if possible? It means that kubeadm can fall back to the current solution if the UserNamespace is not enabled on the master node.
We can promote RootlessControlPlane to beta in 1.24 as this is a good-enough solution in my mind. If RootlessControlPlane keeps being alpha, it will benefit fewer users as it is not by default behavior.
- If UserNamespaceRemapping will be alpha in 1.24, we can support migration to the user namespace solution.
kubeadm initwill check UserNamespaceRemapping feature gate at first. kubeadm can fall back to the current solution if the UserNamespace is not enabled on the master node. - If UserNamespaceRemapping is not ready in 1.24, we can support migration to the user namespace solution once it is ready. RootlessControlPlane will keep being Beta.
- Meanwhile, we have to remove the feature gate or propone the graduation of kubeadm FG RootlessControlPlane util user namespace is beta or GA(Then we can remove RootlessControlPlane) as this is something by default.
As RootlessControlPlane is alpha, could we re-design it to use user namespace if possible?
We can talk more about that. I have not seen similar FG redesigns in k8s, but sounds doable if we redesign the alpha. If our FG is already beta a redesign contradicts with the beta definition, at least in my book.
We can promote RootlessControlPlane to beta in 1.24 as this is a good-enough solution in my mind. If RootlessControlPlane keeps being alpha, it will benefit fewer users as it is not by default behavior.
The main problem with promotion of FGs that we are not sure about to beta, is that users start enabling them in production even if the FG is off by default. Then they start binding their infra to the implementation in weird ways. E.g. reusing the kubeadm managed uid/gids. Beta also means - the underlying design is stable and ready to mature.
I think if we are not sure about something, it is wiser to just wait...maybe one more release. I like where the conversations are going in the user namespace kep and i think they will solve the volume problems as well at some point. If we can, i think we should help drive that kep with what we can - pr reviews etc.
Then they start binding their infra to the implementation in weird ways. E.g. reusing the kubeadm managed uid/gids. Beta also means - the underlying design is stable and ready to mature.
It seems that RootlessControlPlane is not a mature way. I walk through the KEP RootlessControlPlane and the risk https://github.com/kubernetes/enhancements/tree/master/keps/sig-cluster-lifecycle/kubeadm/2568-kubeadm-non-root-control-plane#risks-and-mitigations is still there.
If we hard coded the UID and GID, we could end up in a scenario where those are in use by another process on the machine, which would expose some of the credentials accessible to the UID and GIDs to that process. So we plan to use adduser --system or using the appropriate ranges from /etc/login.defs instead of hard coding the UID and GID.
If this is not a mature way, it should keep alpha and be removed once UserNamespace(a better solution? right?) is out.
All my suggestions in my last comments are based on that either is a good-enough solution for non-root on nodes. If not, the redesign is not acceptable.
https://github.com/kubernetes/enhancements/pull/3065 is merged. 👍
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/lifecycle frozen
our e2e for this feature started failing yesterday. i have no explanation for the time being. but i don't think it's a kubeadm problem, so maybe something in core? https://github.com/kubernetes/kubeadm/issues/2750
our e2e for this feature started failing yesterday. i have no explanation for the time being. but i don't think it's a kubeadm problem, so maybe something in core? #2750
Yes
https://github.com/kubernetes/kubernetes/pull/113548(merged) may fix it. (a revert of https://github.com/kubernetes/kubernetes/pull/113408 that was merged hours before that. )
it looks like the job has been green for a while, so maybe something else fixed it. the failures were in late august. i completely forgot about this..
https://k8s-testgrid.appspot.com/sig-cluster-lifecycle-kubeadm#kubeadm-kinder-rootless-latest
I opened the test grid(You post months ago) and find it failed yesterday(😓).
kubernetes/kubernetes#113548 may fix it. (a revert of kubernetes/kubernetes#113408 that was merged hours before that. )
Yesterday's failure is caused by that. Not failures in August.😄
Is this actually important-longterm? It's been a few years.
/remove-priority important-soon
Is this actually important-longterm? It's been a few years.
This feature is an alternative way for the user namespace feature. As we prefer to use the user namespace to gain the security control plane in the future, we decided to not promote this one to beta. But we should keep this FG until user namespace https://github.com/kubernetes/enhancements/issues/127 is beta.
https://github.com/kubernetes/enhancements/issues/127 User Namespace is beta in v1.30. We may start the deprecation of RootlessControlPlane in v1.31.