kubernetes-ingress Apply UID/GID defaults from image

Proposed changes

build/Dockerfile specifies USER 101 for common target, which is re-applied into the final images. Helm Chart/Manifests do not need to specify UID explicitly, and can instead use the image's UID. (PodSecurityContext v1 core specifies runAsUser defaults to user specified in image metadata if unspecified.)

The existing runAsNonRoot: true flag (already in place) will ensure during runtime that the image is configured with a custom user ID.

This is notably helpful for users running OpenShift, because OpenShift attempts to enforce custom UID/GID ranges for individual namespaces as part of restricted-v2 Security Context Constraint. When removing hard-coded values from manifests, OpenShift will be able to assign its own UID/GID.

In practice, this means a different model of configuring file system permissions. OpenShift assigns the container process GID 0 as supplemental to assist with that. Locations that are expected to be written to must be owned by GID 0, with group write permissions. Previous changes to main have ensured that is the case.

Init container copying files is not a concern, as we will have the same UID as owner there as the main NIC container.

Reference: A Guide to OpenShift and UIDs

Closes #5422.

Checklist

Before creating a PR, run through this checklist and mark each as complete.

[x] I have read the CONTRIBUTING doc
[ ] I have added tests that prove my fix is effective or that my feature works
[x] I have checked that all unit tests pass after adding my changes
[ ] I have updated necessary documentation
[x] I have rebased my branch onto main
[x] I will ensure my PR is targeting the main branch and pulling from my branch from my own fork

Mar 19 '23 14:03 sigv

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 52.35%. Comparing base (8a2c9a9) to head (f38706e). Report is 257 commits behind head on main.

:exclamation: Current head f38706e differs from pull request most recent head ee75176. Consider uploading reports for the commit ee75176 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3665      +/-   ##
==========================================
+ Coverage   52.32%   52.35%   +0.03%     
==========================================
  Files          61       59       -2     
  Lines       17502    16880     -622     
==========================================
- Hits         9158     8838     -320     
+ Misses       8015     7747     -268     
+ Partials      329      295      -34

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Mar 20 '23 16:03 codecov[bot]

Not sure if CI / Smoke Tests (alpine, policies, 1.26.2) failure of internally observed 502 Bad Gateway response is relevant to this PR. 🤔

~Will double check if I can repro.~ Tests being re-run helped. I do not see how this PR can cause a transient test failure.

Mar 22 '23 06:03 sigv

Hey hey. Sorry, been busy with a few initiatives having tight delivery deadlines.

I did not test this out on OpenShift. Opened the PR based on a community member feedback so it doesn't get lost, and myself checked that the K8s perspective is functional.

OpenShift's blog/guidance covers the ownership/access concern:

The Container user is always a member of the root group, so it can read or write files accessible by GID=0. Any command invoked by the Entrypoint will be executed with this unprivileged UID and GID pair. That means, it is an unprivileged user executing the commands and the UID that will be used during execution is not known in advance. From the technical design perspective, that means, directories and files that may be written to by processes in the Container should be owned by the root group and be read/writable by GID=0. Files to be executed should also have group execute permissions.

Will mark this PR as a draft - need to double-check on the concern raised.

Apr 19 '23 07:04 sigv

Sorry for the lack of reply on this. The open initiatives are getting wrapped up so I have some time to look into this again.

Did you test this in OpenShift? If it runs as a random UID and we set all the permissions for the files to the nginx user, will it cause problems?

I personally operate Kubernetes clusters instead of OpenShift clusters. I don't feel like reading the licensing model by RedHat or registering for a 2-month developer trial, so I will spin up a basic Pay-as-You-Go OpenShift cluster in Azure instead. Microsoft explicitly requires Dsv3 (3rd generation D-series) VMs for OpenShift master nodes, so I just requested a quota change on my Azure account. Should be able to deploy soon and report how the file operations behave.

Shouldn't this be enforced in the SCC? Can't we just set 101 there? See https://github.com/nginxinc/nginx-ingress-helm-operator/blob/main/resources/scc.yaml#L7-L9

The linked issue requests a restricted-v2 SCC to be supported out-of-the-box. In particular, the issue is that the SCC "Requires that a pod is run as a user in a pre-allocated range of UIDs". (Reference: RedHat: Managing security context constraints)

I think that means a runAsUser.type: MustRunAsNonRoot preferably, so that no default UID is specified in the SCC. Will double-check this during testing - a good time for me to learn a bit more about the OpenShift policies here.

May 02 '23 18:05 sigv

Fun fact. When deploying an OpenShift cluster in Azure, there are actually three kinds of VMs that get deployed for you:

One bootstrap machine, which is removed after installation
Three control plane machines
Three compute machines

The three control plane machines can be D8s_v3, resulting in 24 cores used. But it appears the bootstrap machine also uses the D8s_v3 size, so the actual minimum quota requirement there is 32 cores (excluding worker/compute nodes). I am waiting for a further quota increase.

May 03 '23 17:05 sigv

Initial testing looks strange against OpenShift 4.10 which is what Azure deploys by default.

I am not seeing issues inside of Pods, which could be expected if UID is used that cannot access files. Instead what I see is that Pods themselves do not get created in the set. Should be able to dedicate more focus time for this tomorrow, or will just sit down with it over the weekend. 🤔

May 11 '23 12:05 sigv

So, a quick sanity check that it works on local K8s without fancy add-ons:

# install microk8s locally
sudo snap install microk8s --classic
microk8s status --wait-ready

# or reset an existing microk8s installation
sudo microk8s reset

# enable basic services on microk8s
microk8s enable dashboard dns registry istio

microk8s helm install nginx-ingress deployments/helm-chart \
  --namespace nginx-ingress --create-namespace \
  --set controller.image.repository=ghcr.io/nginxinc/kubernetes-ingress \
  --set controller.image.tag=edge \
  --set controller.service.type=NodePort

microk8s kubectl -n nginx-ingress get pods
# NAME                                        READY   STATUS    RESTARTS   AGE
# nginx-ingress-controller-6bd8954f55-rxsc9   1/1     Running   0          60s

microk8s kubectl -n nginx-ingress exec -it nginx-ingress-controller-6bd8954f55-rxsc9 -- /usr/bin/id -u
# 101

Currently dropping a few findings and notes. OpenShift has interesting security features built in. To summarize, it wants to run things more restricted than "native Kubernetes".

To be honest, I am a little annoyed by RedHat and/or Microsoft, as I was working based off expectations from latest documentation. When deploying installer-provided infrastructure in Microsoft Azure you get Openshift v4.10 (current is v4.12). Why this matters is that v4.11 introduces the restricted-v2 security context constraints. When running v4.10, you only have the older restricted security context constraints, which does not seem to allow NET_BIND_SERVICE capability to be added explicitly -- that's CAPS column below, listing <no value> instead of ["NET_BIND_SERVICE"].

oc get scc privileged restricted restricted-v2
# NAME         PRIV    CAPS         SELINUX     RUNASUSER        FSGROUP     SUPGROUP   PRIORITY     READONLYROOTFS   VOLUMES
# privileged   true    ["*"]        RunAsAny    RunAsAny         RunAsAny    RunAsAny   <no value>   false            ["*"]
# restricted   false   <no value>   MustRunAs   MustRunAsRange   MustRunAs   RunAsAny   <no value>   false            ["configMap","downwardAPI","emptyDir","persistentVolumeClaim","projected","secret"]
# Error from server (NotFound): securitycontextconstraints.security.openshift.io "restricted-v2" not found

The error event observed for ReplicaSet (not able to deploy Pod) was initially not clear to me, when trying to deploy the Helm chart. But really the "providers" are just "security context constraints" it tries to evaluate.

pods "nginx-ingress-controller-deadbeef00-" is forbidden: unable to validate against any security context constraint: [
provider "anyuid": Forbidden: not usable by user or serviceaccount, spec.containers[0].securityContext.capabilities.add: Invalid value: "NET_BIND_SERVICE": capability may not be added,
provider "nonroot": Forbidden: not usable by user or serviceaccount,
provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount,
provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount,
provider "hostnetwork": Forbidden: not usable by user or serviceaccount,
provider "hostaccess": Forbidden: not usable by user or serviceaccount,
provider "node-exporter": Forbidden: not usable by user or serviceaccount,
provider "privileged": Forbidden: not usable by user or serviceaccount,
provider "privileged-genevalogging": Forbidden: not usable by user or serviceaccount]

And Forbidden: not usable by user or serviceaccount can be resolved by adding RoleBinding for the security context constraint to the nginx-ingress ServiceAccount. I am not sure why restricted SCC is not being automatically applied/why there is no failure information visible to me for restricted SCC in logs.

oc new-project nginx-ingress --display-name='Nginx Ingress Controller'

oc -n nginx-ingress adm policy add-scc-to-user restricted -z nginx-ingress
oc -n nginx-ingress adm policy add-scc-to-user privileged -z nginx-ingress

helm install nginx-ingress deployments/helm-chart \
  --namespace nginx-ingress \
  --set controller.image.repository=ghcr.io/nginxinc/kubernetes-ingress \
  --set controller.image.tag=edge

kubectl -n nginx-ingress exec -it nginx-ingress-controller-689d96577b-cj2kv -- /usr/bin/id -u
# 101

privileged SCC is as open as it gets, and that seems to work.

I want to poke around a bit more -- need to manually configure a restricted SCC variant that allows NET_BIND_SERVICE, to see how the additional security safeguards change the UID behavior. Should probably try an upgrade to v4.11 as well, as I would like to see the built-in restricted-v2 in action.

May 14 '23 17:05 sigv

~I need to check on the difference that edge-ubi brings with itself, as the Building the Ingress Controller Image page says its for OpenShift clusters. That's nginxcontrib/nginx:1.23.4-ubi as base image, instead of default nginx:1.23.4.~

Looking at Characteristics of UBI images, the UBI (Universal Base Images) are a standardized base image offering by RedHat. You end up with the base layers cached in a predictable manner when using various applications. They provide their pre-approved runtimes there and other minor tweaks.

Will re-test later with the UBI images, but I expect them to functionally behave the same.

May 14 '23 17:05 sigv

Red Hat OpenShift Service on AWS (ROSA) needs some manual configuration to authorize access for the RedHat Cloud Console to your AWS account. But after the linking is done, you have a smooth deployment process from RedHat Console. This allows picking specific versions from dropdown, so the deployed cluster is already upgraded for you. Instead of having to rely on the pre-selected version as in Microsoft Azure, and then selecting an upgrade. Small things, but they can matter when you are messing around with cluster deployments.

About OpenShift versioning: v4.10 (released March 2022) is in Maintenance Support until September 2023. This is the version that does not have restricted-v2 security context constraints available. (Docs: Red Hat OpenShift Container Platform Life Cycle Policy)

I was not able to configure an SCC based on restricted that gets picked up in the nginx-ingress namespace. Something is causing the built-in restricted SCC to be ignored, so if I base my new template on it then it gets ignored as well. (Or maybe I am missing something else.)

In any case, running NIC as privileged SCC seems to be functional, just as in plain K8s.

oc new-project nginx-ingress --display-name='Nginx Ingress Controller'
oc adm policy add-scc-to-user privileged -z nginx-ingress -n nginx-ingress
helm install nginx-ingress deployments/helm-chart \
  --namespace nginx-ingress \
  --set controller.image.repository=ghcr.io/nginxinc/kubernetes-ingress \
  --set controller.image.tag=edge-ubi

kubectl -n nginx-ingress get pods
# NAME                                        READY   STATUS    RESTARTS   AGE
# nginx-ingress-controller-6988bc87d9-fckm5   1/1     Running   0          30s
kubectl -n nginx-ingress exec -it nginx-ingress-controller-6988bc87d9-fckm5 -- /bin/id -u
# 101

I then deleted the nginx-ingress namespace, and asked RedHat Console to upgrade my cluster to the 4.11 channel. When starting in 4.11+, you get runAsUser: 1001040000 and accordingly in the Pod:

Failed to write main config: failed to open /etc/nginx/nginx.conf: open /etc/nginx/nginx.conf: permission denied

If Helm chart is applied with controller.readOnlyRootFilesystem=true value, then pod admission doesn't like fsGroup being set to 101 explicitly.

So I am currently trying to think how I can patch up the project for GID=0 as well as resolve fsGroup being flexible. Will follow up with further changes.

May 16 '23 20:05 sigv

Removing fsGroup for readOnlyRootFilesystem works based on the fact that the emptyDir gets created with mode 777.

oc exec -n nginx-ingress -it nginx-ingress-controller-778c5c8fbf-tvxlm -c nginx-ingress -- /bin/bash
bash-5.1$ id -u
# 1001090000
bash-5.1$ id -g
# 0
bash-5.1$ ls -al /var/cache/nginx/
# total 0
# drwxrwsrwx. 7 root       1001090000 98 May 16 20:52 .
# drwxr-xr-x. 1 root       root       19 Apr 12 03:52 ..
# drwx--S---. 2 1001090000 1001090000  6 May 16 20:52 client_temp
# drwx--S---. 2 1001090000 1001090000  6 May 16 20:52 fastcgi_temp
# drwx--S---. 2 1001090000 1001090000  6 May 16 20:52 proxy_temp
# drwx--S---. 2 1001090000 1001090000  6 May 16 20:52 scgi_temp
# drwx--S---. 2 1001090000 1001090000  6 May 16 20:52 uwsgi_temp

Will need to be re-tested on plain Kubernetes.

May 16 '23 21:05 sigv

OpenShift SCCs are actually fairly straight forward. They are a programmatic way to ensure a policy, and they hook into the pod admission process. Nothing crazy. Except sometimes event messages do not include reason why restricted constraint cannot be applied. Go figure.

Overall, ended up being a bit larger of a change than I initially thought. Here is the run-down.

The change removes runAsUser and fsGroup so the UID and GID assignments are not being enforced by Helm chart template directly.

UID selection on Kubernetes is inherited from the image metadata (so it retains UID 101 as now), while the GID logic can be left up to defaults (we don't depend on it).

For OpenShift 4.11+, this change means that the default locked down restricted-v2 security context constraints are functional. They can get randomized UID/GID.

Even though UID/GID are randomized, a supplemental GID 0 is applied. That means the NIC process can work with files and directories where group ownership is set to root.

Dockerfile already configures ownership to GID 0, so we adjust permissions to allow group the same rights as owner user.

I know /var/log/nginx directory was added as a potential write location when implementing read-only root filesystem. The Dockerfile now also changes ownership of this directory, in the same manner as for /etc/nginx and the other /var locations.

The directory was previously mode 755 owned by root:root. Two files existed in it -- access.log and error.log -- with modes 640 owned by nginx:adm. The ownerships are changes to nginx:root now, with mode 775 for directory and 660 for files.

Built an UBI image with make ubi-image TARGET=container and validated that it deploys to OpenShift 4.11 with:

helm install nginx-ingress deployments/helm-chart \
  --namespace nginx-ingress --create-namespace \
  --set controller.image.repository=docker.io/sigv/nginx-ingress \
  --set controller.image.tag=cbe5594-ubi

I did a quick sanity check that the controller responds with an HTTP 404.

I then applied r/o rootfs with:

helm upgrade nginx-ingress deployments/helm-chart -n nginx-ingress \
  --set controller.image.repository=docker.io/sigv/nginx-ingress \
  --set controller.image.tag=cbe5594-ubi \
  --set controller.readOnlyRootFilesystem=true

..and observed the same HTTP 404.

On local machine,

microk8s helm install nginx-ingress deployments/helm-chart \
  --namespace nginx-ingress --create-namespace \
  --set controller.image.repository=docker.io/sigv/nginx-ingress \
  --set controller.image.tag=cbe5594-debian \
  --set controller.service.type=NodePort

..starts up successfully.

I redeployed OpenShift version 4.10. The privileged SCC appears to function as before:

oc new-project nginx-ingress --display-name='Nginx Ingress Controller'
oc adm policy add-scc-to-user privileged -z nginx-ingress -n nginx-ingress
helm install nginx-ingress deployments/helm-chart \
  --namespace nginx-ingress \
  --set controller.image.repository=docker.io/sigv/nginx-ingress \
  --set controller.image.tag=cbe5594-ubi

~Will run the OSS test suite myself as well, to verify there is no strangeness going on.~ The OSS test suite passes locally.

OBS: If the workflow gets run on Github Actions, then it needs to be run as protected inside of the official repository -- there is a Dockerfile change, which means the cache cannot be reused, and the Plus secrets are needed for the new build.

May 16 '23 22:05 sigv

Executive summary: This change allows deployment to OpenShift (v4.11+) without having to manually configure custom security constraints.

May 17 '23 00:05 sigv

@lucacome, should I maybe split this apart into smaller PRs based on the technical changes to be reviewed independently? Or is this okay as-is, one functional change for OpenShift users?

May 17 '23 05:05 sigv

I split off #3925 and #3926 to deal with clean-up changes, that were initially included in this change but can be reviewed individually. Hopefully this makes review easier on those two tasks.

I will rebase this PR, when they are approved, so that change scope here is halved, and update the description accordingly. 🤞🏻

May 19 '23 17:05 sigv

@lucacome, should I split the Dockerfile changes into separate PRs - for chown 101:0 /var/log/nginx and for chmod g=u?

That way this PR only consists of the runAsUser removal? That part is the tricky part that can break some custom images, where people add USER 0 in the Dockerfile but do not add USER 101 later.

Jun 01 '23 14:06 sigv

Thinking more about this, we probably don't want to touch /var/log/nginx as access.log is symlinked to /dev/stdout and error.log is symlinked to /dev/stderr. But in that case, we should probably drop /var/log/nginx from the emptyDir configurations for read-only rootfs.

This stays as WIP. ⏳

Jun 05 '23 10:06 sigv

I split off the chmod g=u change for review as #3962 in the meantime.

Jun 05 '23 19:06 sigv

This was a reply to a comment that has since been removed.

The proposed diff is visible in this Pull Request's Files changed tab.

If you are using the Helm chart, the relevant references are in deployments/helm-chart/templates/ -- depending on whether a DaemonSet or Deployment is used, it's either controller-daemonset.yaml or controller-deployment.yaml respectively.

If you are instead using the static manifests, the relevant references are in deployments/daemon-set or deployments/deployment depending on installation type, as above. Either nginx-ingress.yaml or nginx-plus-ingress.yaml file will be relevant, depending on the service type.

However, do keep in mind that there is a proposed change to build/Dockerfile pending as well, so you would have to manually build an image that you load into your Openshift cluster. The latest release build does not include these prerequisite changes.

A notable part of the Dockerfile change is pending review as #3962. There is still a concern that if you are currently writing anything to /var/log/nginx then this scope would not be enough. However for a vanilla configuration, when that scope is merged, the latest edge build should satisfy your requirements.

@andrescolodrero, if you still experience issues, let me know what Openshift version you are running and what exact commands you are using to carry out the installation.

Jul 25 '23 22:07 sigv

@sigv IM using Openshift 4.12

Installation is done via Operator, basic setup from the UI. Operator version: nginx-ingress-operator.v1.5.0

Jul 25 '23 22:07 andrescolodrero

Yes. The Operator is not yet scoped in for removal of anyuid. This is a slow work-in-progress.

Jul 25 '23 22:07 sigv

ok thanks @sigv I tried also Helm install with same results, i will check building my own image of ingress.

Jul 26 '23 08:07 andrescolodrero

Yes. The Operator is not yet scoped in for removal of anyuid. This is a slow work-in-progress.

Considering that our Operator is derived from the Helm chart... We need to make sure it works. We were talking about this on our end this morning.

Jul 26 '23 22:07 brianehlert

With #4269 merged, I will dust this off (merge conflicts) and look into relevant Operator changes.

There are currently other initiatives that I am working on, but later this month I should be able to work on this change too.

Nov 08 '23 15:11 sigv

Deploy request for nginx-kubernetes-ingress pending review.

Visit the deploys page to approve it

Name	Link
Latest commit	ee751765b5414d97570dbc12a348826a89fb1a69

Jan 30 '24 09:01 netlify[bot]

Not seeing any issues with Operator from my end. Opened nginxinc/nginx-ingress-helm-operator#254 to modify the SCC included in the repository there.

Let me know what testing would be preferable to be done from my side.

Mar 08 '24 03:03 sigv

Hi @sigv can you create a new issue for this as the one linked in description seems incomplete and WIP, we can then prioritise and review the PR, thanks!

Apr 10 '24 12:04 vepatel

@vepatel, opened a fresh #5422 for this scope. 🤞🏻

Apr 18 '24 07:04 sigv

This PR is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 10 days.

Aug 07 '24 01:08 github-actions[bot]

Hi @sigv, now that securityContext is configurably using helm, is this still required?

Aug 12 '24 15:08 j1m-ryan

kubernetes-ingress kubernetes-ingress copied to clipboard

Apply UID/GID defaults from image

Proposed changes

Checklist

Codecov Report

👷 Deploy request for nginx-kubernetes-ingress pending review.

kubernetes-ingress
kubernetes-ingress copied to clipboard

Deploy request for nginx-kubernetes-ingress pending review.