gloo icon indicating copy to clipboard operation
gloo copied to clipboard

Changes needed in the gloo-ee / gloo helm charts for 1.25 compatibility with a namespace using restricted Pod Security Standards (PSS)

Open ably77 opened this issue 1 year ago • 10 comments

Gloo Edge Product

Open Source

Gloo Edge Version

latest

Kubernetes Version

1.25

Describe the bug

Summary: Issues when deploying Gloo Edge on 1.25 with a restricted Pod Security Standard (PSS) profile

  1. gloo-ee/charts/gloo/templates/19-gloo-mtls-certgen-job.yaml container: certgen does now allow setting a complete podSecurityContext ( PSC ).
  2. gloo-ee/charts/gloo/templates/3-discovery-deployment.yaml containers have a hardcoded PSC, missing seccompProfile / ability to override it.
  3. gloo-ee/charts/gloo/templates/5-resource-cleanup-job.yaml container kubectl has a hardcoded PSC without SeccompProfile / Drop of capabilities.
  4. gloo-ee/charts/gloo/templates/5-resource-migration-job.yaml same as number 3.
  5. gloo-ee/charts/gloo/templates/5-resource-rollout-job.yaml same as 3 and 4.
  6. gloo-ee/charts/gloo/templates/6.5-gateway-certgen-job.yaml same as 3 and 4.
  7. gloo-ee/templates/70-resource-rollout-job.yaml same as 3 / 4.
  8. gloo-ee/templates/_helpers.tpl gloo.extauthinitcontainers template does not allow setting a PSC.
  9. Several helm-hooks do not set resource request/limits.

IMHO, a lot of this changes are for "single-shot" pods, adding a default PSC that matches a restricted namespace, the only exception is the _template helper.

Expected Behavior

Gloo Edge OSS and Gloo Edge Enterprise should be able to be deployed in Kubernetes 1.25 with the standards set forth by the restricted PSS profile

Steps to reproduce the bug

deploy latest gloo edge on 1.25 in a cluster set up with restricted PSS profile

Additional Environment Detail

No response

Additional Context

Additional Context: link to PSS doc

Related Issues

  • [x] https://github.com/solo-io/gloo/issues/8455

┆Issue is synchronized with this Asana task by Unito

ably77 avatar Nov 06 '23 18:11 ably77

Note: gloo-ee/templates/70-resource-rollout-job.yaml1 was removed in https://github.com/solo-io/solo-projects/pull/5491/files

sheidkamp avatar May 14 '24 01:05 sheidkamp

@ably77 - question on "9 - Several helm-hooks do not set resource request/limits":

I don't see anything about resource/request limits in the Pod Security Standards. Is this specifically needed for meeting PSS/deploying with a restricted profile, or is this more generally part of requested helm updates?

sheidkamp avatar May 14 '24 01:05 sheidkamp

OSS changes have entered PR.

In addition to adding support for configuring the individual container securityContexts, I have added a flag global.podSecurityStandards.container.enableRestrictedContainerDefaults that will default all container securityContexts to the following securityContext which applies the minimal changes needed to meet the Restricted Pod Security Standards:

securityContext:
  allowPrivilegeEscalation: false
  runAsNonRoot: true
  seccompProfile:
    type: RuntimeDefault
  capabilities:
    drop:
    - ALL

Template specific defaults will be applied to this context.

sheidkamp avatar May 16 '24 20:05 sheidkamp

@ably77 - question on "9 - Several helm-hooks do not set resource request/limits":

I don't see anything about resource/request limits in the Pod Security Standards. Is this specifically needed for meeting PSS/deploying with a restricted profile, or is this more generally part of requested helm updates?

Hey @sheidkamp sorry I missed this. I dont think its a hard requirement that is strictly enforced but is generally a recommended best practice for most organizations to be configurable so more of the "generally part of requested helm updates"

Generally I think we'll see a tool like OPA, Kyverno, or an admission controller that will block a Pod without defined resources from being deployed

ably77 avatar May 16 '24 20:05 ably77

@sheidkamp : great that this got fixed! Is this also covering extauth (this is not visible in the PR)? See https://github.com/solo-io/gloo/issues/8455#issuecomment-1631888657

anessi avatar May 17 '24 07:05 anessi

@ably77 - extauth will be covered in the EE PR that relies on the OSS PR.

For resources limits, that's needed at the container level, basically the same scope as the security contexts?

sheidkamp avatar May 17 '24 14:05 sheidkamp

Resource limits also seem dangerous to enforce given that most of these commands are highly dependant on a customers environment. @ably77 can you move that part to a separate RFE as its not cut and dry as well as potentially being a dangerous update

nfuden avatar May 17 '24 14:05 nfuden

I dont think we need to strictly set a request limit by default, but allow it to be configurable for a user that wants to

ably77 avatar May 17 '24 15:05 ably77

We will consider this. Although everything can already technically be overidden by kustomize we can check in to see if there is a cleaner update

nfuden avatar May 17 '24 19:05 nfuden

@ably77 - looking for some additional clarifications, I see we set the resources in the 5-/6.5-/19- jobs (for example with gateway.cleanupJob.resources).

Can you give examples (or a full list) of the hooks that need this configuration?

sheidkamp avatar May 17 '24 20:05 sheidkamp

The container security changes have been merged into EE/solo-projects main (will be part of the 1.17.0-beta3 release) and the 1.16.x branch (will be part of the 1.16.10 release)

As requested in https://github.com/solo-io/gloo/issues/8864#issuecomment-2117726015, please open another RFE for the resource limits, ideally with clarifications requested in https://github.com/solo-io/gloo/issues/8864#issuecomment-2118323654

sheidkamp avatar May 23 '24 01:05 sheidkamp