kubernetes-ingress
kubernetes-ingress copied to clipboard
NIC should be able to run with the "restricted" POD security level
WIP
Summary
NIC is currently (3.x) required to run as a privileged POD with added capabilities. This is not ideal from a security perspective and not aligned with best practice container security guidelines and standards such as:
- CIS Kubernetes as briefly explained by Aquasec
- NSA Kubernetes Hardening guide
To improve the security posture, NIC should be able to run with the restricted POD security level. See Pod Security Standards for more information.
Motivation
NIC is usually exposed to the Internet and thus a target for all kinds of attacks. The project should always strive to improve the security of NIC.
Goals
- Secure by default
- Restricted security level in deployment resources
Non-goals
- Other security improvements not required by the restricted level such as read-only root filesystem (#1677 )
Proposal
TBD
Hi @hafe thanks for reporting!
Be sure to check out the docs while you wait for a human to take a look at this :slightly_smiling_face:
Cheers!
Please leave default HTTP/S ports as defaults.
Most users of the Ingress Controller will want ports 80 and 443 used. This would particularly impact those who expose NIC to the Internet, as port change would result in users having to type https://nginx-ingress.com:8443/
Docker v20.10.0 (released 2020-12-08) can be seen as supporting binding privileged ports with no capabilities, as it automatically sets sysctl net.ipv4.ip_unprivileged_port_start=0 via https://github.com/moby/moby/pull/41030 merged as https://github.com/moby/moby/commit/888da28d42f7d0f9fa250dd8a75d51c2a6cf3098.
Similarly, K8s docs state that net.ipv4.ip_unprivileged_port_start is considered a safe sysctl since Kubernetes v1.22.
PodSpec could contain securityContext.sysctls with { "name": "net.ipv4.ip_unprivileged_port_start", "value": "0" }
The safe sysctl set, based on documentation, should be namespaced and not interfere with other Pods or the node. This to me implies that with Kubernetes, even if host network is used, the sysctl should be safe to specify.
I have not yet experimented with this.. but a general solution, setting the sysctl, sounds more elegant:
- remove
NET_BIND_SERVICEcapability for Pod, - change
allowPrivilegeEscalationtofalsefor Pod, - remove the
setcapfrom binary.
This would however leave containers running in host network in a peculiar situation. The mentioned Docker runtime change avoids setting the sysctl for host networking (We do not set network sysctls if network namespace is host) as that ends up changing the native host sysctl when outside of namespace.
Such scenario sounds like an issue for people who run Docker natively on host and want to use Nginx Ingress Controller. Is this a supported use-case?
@hafe, is there any aspect in which NIC does not comply with Restricted security level, other than allowPrivilegeEscalation?
@hafe, is there any aspect in which NIC does not comply with Restricted security level, other than
allowPrivilegeEscalation?
I guess not. It is hard to tell when you can't test. Cap net bind seems to be allowed - https://kubernetes.io/docs/concepts/security/pod-security-standards/#restricted
So what is the reason allowPrivilegeEscalation is set to start with?
Allowing Privilege Escalation is done to support NET_BIND_SERVICE in Nginx process when Entrypoint (Ingress Controller) doesn't have it as Permitted/Effective. (Previous discussion: https://github.com/nginxinc/kubernetes-ingress/issues/1677#issuecomment-866499090)
Yes that makes sense from my testing. When I changed to high ports I could remove allowPrivilegeEscalation
@hafe, how would proposal in #3573 look to you?
Alternative approach could be that we set cap_net_bind_service=+ep on /nginx-ingress itself, so there's no "escalation" (IC process gets NET_BIND_SERVICE and then Nginx gets it too). This may be considered more preferable from a security standpoint.
Alternative approach could be that we set
cap_net_bind_service=+epon/nginx-ingressitself, so there's no "escalation" (IC process gets NET_BIND_SERVICE and then Nginx gets it too). This may be considered more preferable from a security standpoint.
Sounds like a good approach!
As part of the linked PR, it was identified that the underlying process does not drop NET_BIND_SERVICE once that capability is effective.
Therefore, on existing approach even if privilege escalation is restricted, code execution could in an attack chain result in binding the low ports.
Instead, nginxinc/kubernetes-ingress@8be01446762dcaae9a2916b3d59ca78c6ca5670f applies the sysctl change to remove privilege requirement (and in effect remove the Escalation use case).
For the scope of Kubernetes policy, this complies (as it's a safe sysctl that individual pods can obtain since K8s v1.22). In future, someone may want to pick up the task to re-implement the Capability, and add proper bind+drop.
You should be able to experiment on the main branch to collect information on what other policy changes are needed. Keep in mind you will have to locally build the image, and cannot rely on previous release's image.
@hafe, is there any aspect in which NIC does not comply with Restricted security level, other than
allowPrivilegeEscalation?
This must also be set in the security context to comply:
seccompProfile:
type: RuntimeDefault
Warning: existing pods in namespace "nginx-ingress" violate the new PodSecurity enforce level "restricted:latest" Warning: nginx-nginx-ingress-7f55b6c8d4-zdhv8: allowPrivilegeEscalation != false, seccompProfile
I like this! I write something like a requirement and others implement 😁
This must also be set in the security context to comply:
seccompProfile: type: RuntimeDefault
@blurpy, would you mind opening a Pull Request for this with you as author?
@blurpy, would you mind opening a Pull Request for this with you as author?
I would have liked to, but I can't prioritize it right now. I would be happy if anyone else has the time to fix it in the mean time.
I would have liked to, but I can't prioritize it right now. I would be happy if anyone else has the time to fix it in the mean time.
Opened #3629 on your behalf.
@sigv thank you for the PR. We going to review it on our side. If we need anything or have any questions, will update your PR thread.
@blurpy, the edge version (latest main) of the chart restricts syscalls based on runtime defaults. Could you please check if there are any other low hanging fruit that the scan picks up?
@blurpy, the edge version (latest
main) of the chart restricts syscalls based on runtime defaults. Could you please check if there are any other low hanging fruit that the scan picks up?
Excellent, thank you! I will have to get back to you on that.
I finally got some time to play with latest main. Nice work with all security improvements lately! What I found out is that running with UID 101 is now what is stopping using the restricted policy
What I found out is that running with UID 101 is now what is stopping using the restricted policy
@hafe, where are you seeing UID 101 being an issue? Is there some reference document/source you could link to? My understanding that any non-zero UID should be okay, based on current Pod Security Standards wording:
Running as Non-root user (v1.23+) Containers must not set
runAsUserto 0Restricted Fields:
spec.securityContext.runAsUserspec.containers[*].securityContext.runAsUserspec.initContainers[*].securityContext.runAsUserspec.ephemeralContainers[*].securityContext.runAsUserAllowed Values
- any non-zero value
undefined/null
OKD/Openshift gives each namespace a UID range and allocates a random UID from that to a pod. If you need a fixed UID you need to use the anyuid policy or a custom one. But I need to play more with this
Understood, it's about OpenShift's restricted-v2 security context constraint (restricted for OpenShift v4.10 and older).
The restricted SCC: [..] Requires that a pod is run as a user in a pre-allocated range of UIDs
This needs a further investigation. I have not worked hands on with OpenShift so I am not 100% familiar with their approach. RedHat Blog has A Guide to OpenShift and UIDs which seems like a decent entrypoint into the topic.
@hafe, if you instead apply the anyuid SCC, how does it look? The documentation linked above says it provides all features of the restricted SCC, but allows users to run with any UID and any GID.
I will do some more checks but otherwise I think this particular issue could be rephrased and closed
I am taking a closer look and PodSecurityContext (v1) says runAsUser defaults to user specified in image metadata if unspecified. OpenShift's Example security context constraints section discusses when no explicit user ID is provided as well.
@hafe, could you check restricted-v2 by setting runAsUser: null, if you have a moment? Proposed diff available in #3665.
OpenShift admission plugin should check openshift.io/sa.scc.uid-range and assign the first UID, if I am reading this right.
@sigv Initial testing looks good! We have the controller running with the restricted profile now.
@blurpy, just to double check, as there are competing request scopes: Kubernetes restricted with latest release, or OpenShift modifying 'run as user'?
@sigv using the restricted pod security standard: https://kubernetes.io/docs/concepts/security/pod-security-standards/#restricted
The namespace is configured as follows:
kubectl describe ns nginx-ingress
Name: nginx-ingress
Labels: kubernetes.io/metadata.name=nginx-ingress
pod-security.kubernetes.io/audit=restricted
pod-security.kubernetes.io/audit-version=latest
pod-security.kubernetes.io/enforce=restricted
pod-security.kubernetes.io/enforce-version=latest
pod-security.kubernetes.io/warn=restricted
pod-security.kubernetes.io/warn-version=latest
Annotations: <none>
Still at the poc stage though, so perhaps we discover something more later.
I originally thought about changing the templates so high (>1024) ports optionally could be configured. This could then be used with normal (non-host) networking and removing the need for privileged mode. I never got around to test. Is this a feasible path?
@hafe, are you on OpenShift 4.11+? Or are you running on the 4.10 Maintenance Support?
I am asking as I want to hear if you have the restricted-v2 SCC available.