kubernetes-ingress icon indicating copy to clipboard operation
kubernetes-ingress copied to clipboard

NIC should be able to run with the "restricted" POD security level

Open hafe opened this issue 2 years ago • 37 comments

WIP

Summary

NIC is currently (3.x) required to run as a privileged POD with added capabilities. This is not ideal from a security perspective and not aligned with best practice container security guidelines and standards such as:

To improve the security posture, NIC should be able to run with the restricted POD security level. See Pod Security Standards for more information.

Motivation

NIC is usually exposed to the Internet and thus a target for all kinds of attacks. The project should always strive to improve the security of NIC.

Goals

  • Secure by default
  • Restricted security level in deployment resources

Non-goals

  • Other security improvements not required by the restricted level such as read-only root filesystem (#1677 )

Proposal

TBD

hafe avatar Feb 11 '23 08:02 hafe

Hi @hafe thanks for reporting!

Be sure to check out the docs while you wait for a human to take a look at this :slightly_smiling_face:

Cheers!

github-actions[bot] avatar Feb 11 '23 08:02 github-actions[bot]

Please leave default HTTP/S ports as defaults.

Most users of the Ingress Controller will want ports 80 and 443 used. This would particularly impact those who expose NIC to the Internet, as port change would result in users having to type https://nginx-ingress.com:8443/


Docker v20.10.0 (released 2020-12-08) can be seen as supporting binding privileged ports with no capabilities, as it automatically sets sysctl net.ipv4.ip_unprivileged_port_start=0 via https://github.com/moby/moby/pull/41030 merged as https://github.com/moby/moby/commit/888da28d42f7d0f9fa250dd8a75d51c2a6cf3098.

Similarly, K8s docs state that net.ipv4.ip_unprivileged_port_start is considered a safe sysctl since Kubernetes v1.22. PodSpec could contain securityContext.sysctls with { "name": "net.ipv4.ip_unprivileged_port_start", "value": "0" }

The safe sysctl set, based on documentation, should be namespaced and not interfere with other Pods or the node. This to me implies that with Kubernetes, even if host network is used, the sysctl should be safe to specify.

I have not yet experimented with this.. but a general solution, setting the sysctl, sounds more elegant:

  • remove NET_BIND_SERVICE capability for Pod,
  • change allowPrivilegeEscalation to false for Pod,
  • remove the setcap from binary.

This would however leave containers running in host network in a peculiar situation. The mentioned Docker runtime change avoids setting the sysctl for host networking (We do not set network sysctls if network namespace is host) as that ends up changing the native host sysctl when outside of namespace.

Such scenario sounds like an issue for people who run Docker natively on host and want to use Nginx Ingress Controller. Is this a supported use-case?

sigv avatar Feb 20 '23 12:02 sigv

@hafe, is there any aspect in which NIC does not comply with Restricted security level, other than allowPrivilegeEscalation?

sigv avatar Feb 20 '23 12:02 sigv

@hafe, is there any aspect in which NIC does not comply with Restricted security level, other than allowPrivilegeEscalation?

I guess not. It is hard to tell when you can't test. Cap net bind seems to be allowed - https://kubernetes.io/docs/concepts/security/pod-security-standards/#restricted

hafe avatar Feb 20 '23 14:02 hafe

So what is the reason allowPrivilegeEscalation is set to start with?

hafe avatar Feb 20 '23 15:02 hafe

Allowing Privilege Escalation is done to support NET_BIND_SERVICE in Nginx process when Entrypoint (Ingress Controller) doesn't have it as Permitted/Effective. (Previous discussion: https://github.com/nginxinc/kubernetes-ingress/issues/1677#issuecomment-866499090)

sigv avatar Feb 20 '23 15:02 sigv

Yes that makes sense from my testing. When I changed to high ports I could remove allowPrivilegeEscalation

hafe avatar Feb 20 '23 15:02 hafe

@hafe, how would proposal in #3573 look to you?

sigv avatar Feb 21 '23 07:02 sigv

Alternative approach could be that we set cap_net_bind_service=+ep on /nginx-ingress itself, so there's no "escalation" (IC process gets NET_BIND_SERVICE and then Nginx gets it too). This may be considered more preferable from a security standpoint.

sigv avatar Feb 21 '23 10:02 sigv

Alternative approach could be that we set cap_net_bind_service=+ep on /nginx-ingress itself, so there's no "escalation" (IC process gets NET_BIND_SERVICE and then Nginx gets it too). This may be considered more preferable from a security standpoint.

Sounds like a good approach!

hafe avatar Feb 21 '23 19:02 hafe

As part of the linked PR, it was identified that the underlying process does not drop NET_BIND_SERVICE once that capability is effective.

Therefore, on existing approach even if privilege escalation is restricted, code execution could in an attack chain result in binding the low ports.

Instead, nginxinc/kubernetes-ingress@8be01446762dcaae9a2916b3d59ca78c6ca5670f applies the sysctl change to remove privilege requirement (and in effect remove the Escalation use case).

For the scope of Kubernetes policy, this complies (as it's a safe sysctl that individual pods can obtain since K8s v1.22). In future, someone may want to pick up the task to re-implement the Capability, and add proper bind+drop.

You should be able to experiment on the main branch to collect information on what other policy changes are needed. Keep in mind you will have to locally build the image, and cannot rely on previous release's image.

sigv avatar Mar 02 '23 19:03 sigv

@hafe, is there any aspect in which NIC does not comply with Restricted security level, other than allowPrivilegeEscalation?

This must also be set in the security context to comply:

seccompProfile:
  type: RuntimeDefault

Warning: existing pods in namespace "nginx-ingress" violate the new PodSecurity enforce level "restricted:latest" Warning: nginx-nginx-ingress-7f55b6c8d4-zdhv8: allowPrivilegeEscalation != false, seccompProfile

blurpy avatar Mar 07 '23 11:03 blurpy

I like this! I write something like a requirement and others implement 😁

hafe avatar Mar 07 '23 13:03 hafe

This must also be set in the security context to comply:

seccompProfile:
  type: RuntimeDefault

@blurpy, would you mind opening a Pull Request for this with you as author?

sigv avatar Mar 07 '23 17:03 sigv

@blurpy, would you mind opening a Pull Request for this with you as author?

I would have liked to, but I can't prioritize it right now. I would be happy if anyone else has the time to fix it in the mean time.

blurpy avatar Mar 09 '23 07:03 blurpy

I would have liked to, but I can't prioritize it right now. I would be happy if anyone else has the time to fix it in the mean time.

Opened #3629 on your behalf.

sigv avatar Mar 09 '23 10:03 sigv

@sigv thank you for the PR. We going to review it on our side. If we need anything or have any questions, will update your PR thread.

jasonwilliams14 avatar Mar 10 '23 01:03 jasonwilliams14

@blurpy, the edge version (latest main) of the chart restricts syscalls based on runtime defaults. Could you please check if there are any other low hanging fruit that the scan picks up?

sigv avatar Mar 15 '23 16:03 sigv

@blurpy, the edge version (latest main) of the chart restricts syscalls based on runtime defaults. Could you please check if there are any other low hanging fruit that the scan picks up?

Excellent, thank you! I will have to get back to you on that.

blurpy avatar Mar 17 '23 09:03 blurpy

I finally got some time to play with latest main. Nice work with all security improvements lately! What I found out is that running with UID 101 is now what is stopping using the restricted policy

hafe avatar Mar 17 '23 16:03 hafe

What I found out is that running with UID 101 is now what is stopping using the restricted policy

@hafe, where are you seeing UID 101 being an issue? Is there some reference document/source you could link to? My understanding that any non-zero UID should be okay, based on current Pod Security Standards wording:

Running as Non-root user (v1.23+) Containers must not set runAsUser to 0

Restricted Fields:

  • spec.securityContext.runAsUser
  • spec.containers[*].securityContext.runAsUser
  • spec.initContainers[*].securityContext.runAsUser
  • spec.ephemeralContainers[*].securityContext.runAsUser

Allowed Values

  • any non-zero value
  • undefined/null

sigv avatar Mar 19 '23 11:03 sigv

OKD/Openshift gives each namespace a UID range and allocates a random UID from that to a pod. If you need a fixed UID you need to use the anyuid policy or a custom one. But I need to play more with this

hafe avatar Mar 19 '23 12:03 hafe

Understood, it's about OpenShift's restricted-v2 security context constraint (restricted for OpenShift v4.10 and older). The restricted SCC: [..] Requires that a pod is run as a user in a pre-allocated range of UIDs

This needs a further investigation. I have not worked hands on with OpenShift so I am not 100% familiar with their approach. RedHat Blog has A Guide to OpenShift and UIDs which seems like a decent entrypoint into the topic.

@hafe, if you instead apply the anyuid SCC, how does it look? The documentation linked above says it provides all features of the restricted SCC, but allows users to run with any UID and any GID.

sigv avatar Mar 19 '23 14:03 sigv

I will do some more checks but otherwise I think this particular issue could be rephrased and closed

hafe avatar Mar 19 '23 14:03 hafe

I am taking a closer look and PodSecurityContext (v1) says runAsUser defaults to user specified in image metadata if unspecified. OpenShift's Example security context constraints section discusses when no explicit user ID is provided as well.

@hafe, could you check restricted-v2 by setting runAsUser: null, if you have a moment? Proposed diff available in #3665. OpenShift admission plugin should check openshift.io/sa.scc.uid-range and assign the first UID, if I am reading this right.

sigv avatar Mar 19 '23 14:03 sigv

@sigv Initial testing looks good! We have the controller running with the restricted profile now.

blurpy avatar Mar 29 '23 06:03 blurpy

@blurpy, just to double check, as there are competing request scopes: Kubernetes restricted with latest release, or OpenShift modifying 'run as user'?

sigv avatar Mar 30 '23 06:03 sigv

@sigv using the restricted pod security standard: https://kubernetes.io/docs/concepts/security/pod-security-standards/#restricted

The namespace is configured as follows:

kubectl describe ns nginx-ingress
Name:         nginx-ingress
Labels:       kubernetes.io/metadata.name=nginx-ingress
              pod-security.kubernetes.io/audit=restricted
              pod-security.kubernetes.io/audit-version=latest
              pod-security.kubernetes.io/enforce=restricted
              pod-security.kubernetes.io/enforce-version=latest
              pod-security.kubernetes.io/warn=restricted
              pod-security.kubernetes.io/warn-version=latest
Annotations:  <none>

Still at the poc stage though, so perhaps we discover something more later.

blurpy avatar Mar 30 '23 07:03 blurpy

I originally thought about changing the templates so high (>1024) ports optionally could be configured. This could then be used with normal (non-host) networking and removing the need for privileged mode. I never got around to test. Is this a feasible path?

hafe avatar Apr 07 '23 08:04 hafe

@hafe, are you on OpenShift 4.11+? Or are you running on the 4.10 Maintenance Support?

I am asking as I want to hear if you have the restricted-v2 SCC available.

sigv avatar May 17 '23 00:05 sigv