contour icon indicating copy to clipboard operation
contour copied to clipboard

Supply the contour configurations with PodSecurityPolicy and related RBAC configs

Open surajssd opened this issue 5 years ago • 20 comments

What steps did you take and what happened:

I deployed contour onto a cluster that has PSP enabled. And most of the things didn't work.

$ kubectl apply -f examples/contour

After that this is the pod status:

$ kubectl get pods
NAME                       READY   STATUS                       RESTARTS   AGE
contour-6f4746b855-4cpj9   0/1     ContainerCreating            0          10m
contour-6f4746b855-cvn9k   0/1     ContainerCreating            0          10m
contour-certgen-kz4nt      0/1     CreateContainerConfigError   0          11m
  • contour-certgen-kz4nt is in CreateContainerConfigError because the process inside it is running as root.
  • Daemonset envoy did not come up because it needs elevated privileges and this is what I see in the events:
4m42s       Warning   FailedCreate        daemonset/envoy
Error creating: pods "envoy-" is forbidden: unable to validate against any pod
security policy: [spec.containers[0].hostPort: Invalid value: 80: Host port 80
is not allowed to be used. Allowed ports: [] spec.containers[0].hostPort: Invalid
value: 443: Host port 443 is not allowed to be used. Allowed ports: []]
  • Deployment contour has been waiting for the volumes to be mounted that will be populated by the job:
3m37s       Warning   FailedMount         pod/contour-6f4746b855-cvn9k
Unable to attach or mount volumes: unmounted volumes=[contourcert cacert],
unattached volumes=[contourcert cacert contour-config contour-token-8ntjh]:
timed out waiting for the condition

What did you expect to happen:

I expect the contour project to ship PSP (and related RBAC configs) in the example/contour directory. So that when I run following command all the components get deployed properly without issues.

kubectl apply -f examples/contour

Anything else you would like to add:

Events during the deployment:

$ kubectl get events
LAST SEEN   TYPE      REASON              OBJECT                          MESSAGE
<unknown>   Normal    Scheduled           pod/contour-6f4746b855-4cpj9    Successfully assigned projectcontour/contour-6f4746b855-4cpj9 to suraj-cluster-helium-worker-0
3m59s       Warning   FailedMount         pod/contour-6f4746b855-4cpj9    MountVolume.SetUp failed for volume "cacert" : secret "cacert" not found
117s        Warning   FailedMount         pod/contour-6f4746b855-4cpj9    MountVolume.SetUp failed for volume "contourcert" : secret "contourcert" not found
3m35s       Warning   FailedMount         pod/contour-6f4746b855-4cpj9    Unable to attach or mount volumes: unmounted volumes=[cacert contourcert], unattached volumes=[cacert contour-config contour-token-8ntjh contourcert]: timed out waiting for the condition
5m53s       Warning   FailedMount         pod/contour-6f4746b855-4cpj9    Unable to attach or mount volumes: unmounted volumes=[contourcert cacert], unattached volumes=[contour-token-8ntjh contourcert cacert contour-config]: timed out waiting for the condition
<unknown>   Normal    Scheduled           pod/contour-6f4746b855-cvn9k    Successfully assigned projectcontour/contour-6f4746b855-cvn9k to suraj-cluster-helium-worker-2
117s        Warning   FailedMount         pod/contour-6f4746b855-cvn9k    MountVolume.SetUp failed for volume "contourcert" : secret "contourcert" not found
3m59s       Warning   FailedMount         pod/contour-6f4746b855-cvn9k    MountVolume.SetUp failed for volume "cacert" : secret "cacert" not found
8m8s        Warning   FailedMount         pod/contour-6f4746b855-cvn9k    Unable to attach or mount volumes: unmounted volumes=[contourcert cacert], unattached volumes=[contour-config contour-token-8ntjh contourcert cacert]: timed out waiting for the condition
5m52s       Warning   FailedMount         pod/contour-6f4746b855-cvn9k    Unable to attach or mount volumes: unmounted volumes=[cacert contourcert], unattached volumes=[cacert contour-config contour-token-8ntjh contourcert]: timed out waiting for the condition
3m37s       Warning   FailedMount         pod/contour-6f4746b855-cvn9k    Unable to attach or mount volumes: unmounted volumes=[contourcert cacert], unattached volumes=[contourcert cacert contour-config contour-token-8ntjh]: timed out waiting for the condition
10m         Normal    SuccessfulCreate    replicaset/contour-6f4746b855   Created pod: contour-6f4746b855-cvn9k
10m         Normal    SuccessfulCreate    replicaset/contour-6f4746b855   Created pod: contour-6f4746b855-4cpj9
<unknown>   Normal    Scheduled           pod/contour-certgen-kz4nt       Successfully assigned projectcontour/contour-certgen-kz4nt to suraj-cluster-helium-worker-1
12s         Normal    Pulling             pod/contour-certgen-kz4nt       Pulling image "docker.io/projectcontour/contour:master"
8m53s       Normal    Pulled              pod/contour-certgen-kz4nt       Successfully pulled image "docker.io/projectcontour/contour:master"
8m53s       Warning   Failed              pod/contour-certgen-kz4nt       Error: container has runAsNonRoot and image will run as root
10m         Normal    SuccessfulCreate    job/contour-certgen             Created pod: contour-certgen-kz4nt
10m         Normal    ScalingReplicaSet   deployment/contour              Scaled up replica set contour-6f4746b855 to 2
4m42s       Warning   FailedCreate        daemonset/envoy                 Error creating: pods "envoy-" is forbidden: unable to validate against any pod security policy: [spec.containers[0].hostPort: Invalid value: 80: Host port 80 is not allowed to be used. Allowed ports: [] spec.containers[0].hostPort: Invalid value: 443: Host port 443 is not allowed to be used. Allowed ports: []]

The pod contour pods picked up following PSP

$ kubectl get pods contour-6f4746b855-4cpj9 -o yaml | grep -i psp
    kubernetes.io/psp: restricted

and same for the job pod:

$ kubectl get pods contour-certgen-kz4nt -o yaml | grep psp
    kubernetes.io/psp: restricted

Here is what the PSP restricted looks like:

apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: restricted
  annotations:
    seccomp.security.alpha.kubernetes.io/allowedProfileNames: 'docker/default'
    seccomp.security.alpha.kubernetes.io/defaultProfileName:  'docker/default'
spec:
  privileged: false
  # Required to prevent escalations to root.
  allowPrivilegeEscalation: false
  # This is redundant with non-root + disallow privilege escalation,
  # but we can provide it for defense in depth.
  requiredDropCapabilities:
  - KILL
  - MKNOD
  - SETUID
  - SETGID
  # Allow core volume types.
  volumes:
  - 'configMap'
  - 'emptyDir'
  - 'projected'
  - 'secret'
  - 'downwardAPI'
  # Assume that persistentVolumes set up by the cluster admin are safe to use.
  - 'persistentVolumeClaim'
  hostNetwork: false
  hostIPC: false
  hostPID: false
  runAsUser:
    # Require the container to run without root privileges.
    rule: 'MustRunAsNonRoot'
  seLinux:
    rule: 'RunAsAny'
  supplementalGroups:
    rule: 'MustRunAs'
    ranges:
    # Forbid adding the root group.
    - min: 1
      max: 65535
  fsGroup:
    rule: 'MustRunAs'
    ranges:
    # Forbid adding the root group.
    - min: 1
      max: 65535
  readOnlyRootFilesystem: false
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: restricted-psp
rules:
- apiGroups: ['policy']
  resources: ['podsecuritypolicies']
  verbs:     ['use']
  resourceNames:
  - restricted
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: restricted-psp-system-authenticated
roleRef:
  kind: ClusterRole
  name: restricted-psp
  apiGroup: rbac.authorization.k8s.io
subjects:
- kind: Group
  name: system:authenticated
  apiGroup: rbac.authorization.k8s.io

Environment:

  • Contour version:

Deployed the contour at following version 44d2aa6b02f91c8468be0369dcfa58c5a7388523

  • Kubernetes version:
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.1", GitCommit:"d647ddbd755faf07169599a625faf302ffc34458", GitTreeState:"clean", BuildDate:"2019-10-02T17:01:15Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.2", GitCommit:"c97fe5036ef3df2967d086711e6c0c405941e14b", GitTreeState:"clean", BuildDate:"2019-10-15T19:09:08Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes installer & version:

https://github.com/kinvolk/lokomotive-kubernetes/ at version fe3cf9190e054642dff88e1848f4edc64636dfa7

  • Cloud provider or hardware configuration:

Packet cloud, with t1.small for master node and s1.large for workers.

  • OS:
$ cat /etc/os-release 
NAME="Flatcar Linux by Kinvolk"
ID=flatcar
ID_LIKE=coreos
VERSION=2079.5.0
VERSION_ID=2079.5.0
BUILD_ID=2019-06-05-1748
PRETTY_NAME="Flatcar Linux by Kinvolk 2079.5.0 (Rhyolite)"
ANSI_COLOR="38;5;75"
HOME_URL="https://flatcar-linux.org/"
BUG_REPORT_URL="https://issues.flatcar-linux.org"
FLATCAR_BOARD="amd64-usr"

surajssd avatar Nov 13 '19 13:11 surajssd

I will be willing to take this forward if nobody minds.

surajssd avatar Nov 13 '19 13:11 surajssd

/cc @youngnick

On 13 Nov 2019, at 14:53, Suraj Deshmukh [email protected] wrote:

 I will be willing to take this forward if nobody minds.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

davecheney avatar Nov 13 '19 14:11 davecheney

Thanks for this @surajssd, this looks interesting, as does your related PR. I will check it out in detail today.

youngnick avatar Nov 13 '19 21:11 youngnick

I've checked out the PR, and in an initial pass, it looks okay, with some changes. Like I said in the PR though, I don't want to approve this without giving it a proper run and test myself, and I won't have time in the next week. It's been a little while since I did PSP stuff, and I want to verify I haven't missed anything else.

Sorry about the delay, this is great work and I would like to get it in when we can.

youngnick avatar Nov 14 '19 04:11 youngnick

After talking to a few people (including @surajssd) at Kubecon, I don't think that adding a PSP is the right way to go here.

Effectively, what we need to do is:

  • document the security capabilities that the Contour and Envoy pods need to do their jobs
  • Ensure that they are applied to the pods.

In addition, there is one big problem with the current implementation of PSPs: there is no way to ensure that the PSP you create will be applied. If multiple PSPs match, then the first in alphabetical order will be applied. This is an acknowledged weakness of the current design, and is one of the reasons that PSPs in their current form are likely to be deprecated, and replaced with something else.

To that end, I think that using the pod.spec.securityContext would be the right way to go here, and have that set what the pod requires. That way, we can have the pod's security requirements fit into one of the common security buckets, or people can easily create their own, whatever security feature they use.

youngnick avatar Nov 25 '19 23:11 youngnick

I was about to open an issue around hardening the default deployment of contour and envoy.

Agreed that contour should not ship a PSP, but instead set the security context to the minimum privileges required.

I experimented with this a bit and ran into an issue with hardening Envoy. I tried setting user != root, dropping all capabilities and adding the NET_BIND_SERVICE to no success. It seems like there is currently no way to specify capabilities to non-root users in Kubernetes. See https://github.com/kubernetes/kubernetes/issues/56374

The nginx ingress controller seems to be getting around this by adding the capability during the docker build: https://github.com/kubernetes/ingress-nginx/blob/3ffe85537fac9faf9bcdfc845de0d6979c8a3ff6/rootfs/Dockerfile#L53

alexbrand avatar Nov 27 '19 18:11 alexbrand

I don't think that adding a PSP is the right way to go here

The only real argument I see for this is that multiple matching PSPs have undefined results. Is there something I'm missing in my skimming of the issue?

I think that there are effectively two main environments to consider:

  1. PSP not enabled (e.g. vanilla GKE), in which case it's harmless.
  2. PSP enabled (e.g. GKE with this), in which case I have to hand-roll a PSP because one wasn't provided.

I know you generally expect folks to customize the Contour configurations (perhaps heavily), but it seems like the best "example" you could provide would work in both of the above environments out of the box.

mattmoor avatar Apr 16 '20 04:04 mattmoor

The problem is, because of the multiple matching problem, if folks already have PSPs enabled, and we add more, the install may or may not work, based on what they've done with their PSPs.

I can see an argument for having an example PSP optionally available (maybe in a separate examples/ directory), but I don't want to put it in the default install.

youngnick avatar Apr 16 '20 04:04 youngnick

Additionally, PSPs are being deprecated in favour of OPA/Gatekeeper integrations anyway, as far as I heard from the sig-auth update in San Diego, anyway.

youngnick avatar Apr 16 '20 04:04 youngnick

@youngnick Hmm, they are still listed as Beta in 1.18 🤔

If you have any more info on the deprecation, I'd love to know more before I spend a bunch of cycles plumbing them places in Knative 😅

mattmoor avatar Apr 16 '20 04:04 mattmoor

@mattmoor If you have following PSP privileged installed in your cluster:

apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: privileged
  annotations:
    seccomp.security.alpha.kubernetes.io/allowedProfileNames: '*'
spec:
  privileged: true
  allowPrivilegeEscalation: true
  allowedCapabilities:
  - '*'
  volumes:
  - '*'
  hostNetwork: true
  hostPorts:
  - min: 0
    max: 65535
  hostIPC: true
  hostPID: true
  runAsUser:
    rule: 'RunAsAny'
  seLinux:
    rule: 'RunAsAny'
  supplementalGroups:
    rule: 'RunAsAny'
  fsGroup:
    rule: 'RunAsAny'

I did following custom changes to contour configs and things worked for me:

---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: privileged-psp
rules:
- apiGroups: ['policy']
  resources: ['podsecuritypolicies']
  verbs:     ['use']
  resourceNames:
  - privileged
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: envoy
  namespace: projectcontour
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: envoy-privileged-psp
  namespace: projectcontour
roleRef:
  kind: ClusterRole
  name: privileged-psp
  apiGroup: rbac.authorization.k8s.io
subjects:
- kind: ServiceAccount
  name: envoy
  namespace: projectcontour
---
# NOTE: Add above "envoy" service account to the "envoy" Daemonset in the field `spec.template.spec.serviceAccountName`

For the contour operator any restricted PSP would be ok. Because it just talks to apiserver and does not need any more privileges on the node.

surajssd avatar Apr 16 '20 06:04 surajssd

Oh, I got PSPs working with Contour, I just felt like they might be useful as part of the "example" configuration that's shipped.

mattmoor avatar Apr 16 '20 14:04 mattmoor

Why can't we have both? I think we should add an example (along with docs) on how Contour implements. We can keep the quick start clean since its goal is to get up and running quickly in a simple environment but have other examples of how to implement PSP's if a user requires that type of deployment.

stevesloka avatar Apr 16 '20 14:04 stevesloka

We can keep the quick start clean since its goal is to get up and running quickly

So from what I've seen PSPs are benign until you enable them, so I'm not sure why they'd compete with the "quickstart" goal, but I may be missing something.

mattmoor avatar Apr 16 '20 15:04 mattmoor

Ahh ok I see your point @mattmoor. I was just going off of what @youngnick suggested. I don't have a ton of experience with PSPs.

stevesloka avatar Apr 16 '20 15:04 stevesloka

I was pointed at this: https://www.youtube.com/watch?v=SFtHRmPuhEw&feature=youtu.be&t=920

mattmoor avatar Apr 16 '20 15:04 mattmoor

So, turns out the recommendation from sig-auth here is to supply PSPs for now, and @mattmoor is right, they should be benign in clusters that don't have it.

So, we will need to add something like this for Envoy and the possibly the certgen job.

The envoy work is blocked on #2374 then - adding a serviceaccount to the Envoy example deployment. Once that's done, we can reference that serviceaccount in the example and bind the new PSP to it.

youngnick avatar Apr 17 '20 00:04 youngnick

As an update to what @youngnick mentioned few years back https://github.com/projectcontour/contour/issues/1896#issuecomment-558387020 the PodSecurityPolicies were deprecated and are scheduled to be removed in Kubernetes v1.25.

The replacement Pod Security Admission (beta currently) works by applying labels on the namespace to restrict the maximum level of privileges that pods can get in that namespace. These levels are pre-defined by Pod Security Standards. In theory, we could apply these in the example manifest for the projectcontour namespace but on the other hand I think enforcing a policy is something that the cluster administrator is responsible for, and the responsibility of the application (developers) is to document and use the minimum required privileges. It makes less sense for the application to enforce a policy to itself. That would imply "the fox is guarding the hen house" situation :).

tsaarni avatar Feb 23 '22 07:02 tsaarni

Agreed and thanks @tsaarni. I think that we should ensure that our example deployments meet the Baseline security standard at a minimum, and ideally they should work in a Restricted namespace as well. I don't know if that's achievable with Envoy though.

youngnick avatar Feb 25 '22 04:02 youngnick

The Contour project currently lacks enough contributors to adequately respond to all Issues.

This bot triages Issues according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the Issue is closed

You can:

  • Mark this Issue as fresh by commenting
  • Close this Issue
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

github-actions[bot] avatar May 23 '24 00:05 github-actions[bot]

The Contour project currently lacks enough contributors to adequately respond to all Issues.

This bot triages Issues according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the Issue is closed

You can:

  • Mark this Issue as fresh by commenting
  • Close this Issue
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

github-actions[bot] avatar Jun 22 '24 00:06 github-actions[bot]