osm icon indicating copy to clipboard operation
osm copied to clipboard

Envoy custom settings for securityContext.

Open granwizzard opened this issue 2 years ago • 14 comments

Hello,

I'm having an issue related to being in compliance with the security best practices when deploying OSM through, Azure portal, Az CLI or even directly on the cluster. We are following a security baseline where the container Pods needs to be in compliance with several policies, in one of these policies where its role is to enforce the drop of certain capabilities, can't run as root, etc... is causing sidecars creation to fail.

Message:             Error creating: admission webhook "validation.gatekeeper.sh" denied the request: [azurepolicy-k8sazureallowe ││ usersgroups-95xxx00xxdefa68xxx37[] Container osm-init is attempting to run as disallowed user 0. Allowed runAsUser: {"ranges": [] ││ , "rule": "MustRunAsNonRoot"} ││ [azurepolicy-container-allowed-capabilities-41f271b4a816bd8cb11b] container <envoy> is not dropping all required capabilities. Co ││ ntainer must drop all of ["NET_RAW"] or "ALL" ││ [azurepolicy-container-allowed-capabilities-41f271b4a816bd8cb11b] container <osm-init> is not dropping all required capabilities. ││  Container must drop all of ["NET_RAW"] or "ALL" ││ [azurepolicy-container-allowed-capabilities-41f271b4a816bd8cb11b] container <osm-init> has a disallowed capability. Allowed capabilities are []

How can we change the sidecars settings to be in compliance?

Thank you for your support.

granwizzard avatar Jun 07 '22 16:06 granwizzard

We should probably default this to run as non-root.

If we did this, do you need further customization?

steeling avatar Jun 14 '22 18:06 steeling

I'll call out that this is technically the initContainer and not the sidecar; the init container needs some root privileges (i.e. NET_RAW and NET_ADMIN) in order to change iptables rules, but it doesn't have to run as root to do so

keithmattix avatar Jun 14 '22 18:06 keithmattix

We should probably default this to run as non-root.

If we did this, do you need further customization?

This would be a step in a good direction, but not enough, unfortunately. Nowadays with the zero-trust model, security posture, and new vulnerabilities being found sooner or later, it will require new changes/tweaks.

granwizzard avatar Jun 16 '22 07:06 granwizzard

I'll call out that this is technically the initContainer and not the sidecar; the init container needs some root privileges (i.e. NET_RAW and NET_ADMIN) in order to change iptables rules, but it doesn't have to run as root to do so

We are allowing the OSM to run in a namespace where the policy is excluded for this namespace, but as soon as we onboard a new namespaces to OSM is where we are facing the issues, the policy is mandatory for all namespaces except the one where we are running OSM.

If this is true and is requirement, NET_RAW or NET_ADMIN to be allowed, how can we apply best practices security practices for K8s clusters?

If we have hundreds of namespaces to be on-boarded in OSM and we need to allow these capabilities how can we fulfill the company security policy and at the same time provide this service mesh?

I already tested several service meshes and all of them requires NET_RAW and I was hopping OSM could be different and be in compliance with the latest security best practices.

Any suggestion will be much appreciated because it seems all service mesh's will fail to be in compliance with the companies security polices unless the company accepts the risk but for hundreds of containers it will be difficult for a company to accept such high risk.

Thank you.

granwizzard avatar Jun 16 '22 20:06 granwizzard

Can your security policies have an exception list based on container name? Only a single container (the initcontainer) requires those privileges.

If that's not workable, then we can look into moving the initialization logic requiring those privileges to a CNI plugin so that each individual pod in the mesh doesn't have to have a container with such high privileges

keithmattix avatar Jun 17 '22 00:06 keithmattix

Hello @keithmattix,

A very good incentive to grow the adoption of OSM (at least for AKS) would be to make sure that the add-on deploys a mesh that is compliant by default with the Azure Policies. Today the Azure policy add-on and the OSM one are mutually exclusive. This could be a show stopper in many organizations.

Best Regards

stephaneey avatar Jun 20 '22 06:06 stephaneey

Can your security policies have an exception list based on container name? Only a single container (the initcontainer) requires those privileges.

This initContainers always have the same name? In my case, I'm testing this using the Defender for Cloud built-in policy for K8s.

If that's not workable, then we can look into moving the initialization logic requiring those privileges to a CNI plugin so that each individual pod in the mesh doesn't have to have a container with such high privileges

This is a good idea if possible because the end goal is not to exclude but to be in compliance. In my case, we are using Kubenet and not CNI, because of networking constraints, so it sounds like even if you implement your suggestion it will not be compatible with Kubenet, right?

Thank you for your help and support.

granwizzard avatar Jun 20 '22 08:06 granwizzard

This is a good idea if possible because the end goal is not to exclude but to be in compliance. In my case, we are using Kubenet and not CNI, because of networking constraints, so it sounds like even if you implement your suggestion it will not be compatible with Kubenet, right?

Istio does this with their CNI plugin, which can be used with Calico behind the scenes. Calico is also a managed add-on. It'd be great if OSM could do something similar, without enforcing Azure CNI which is indeed problematic with IPs.

stephaneey avatar Jun 20 '22 08:06 stephaneey

Can your security policies have an exception list based on container name? Only a single container (the initcontainer) requires those privileges.

This initContainers always have the same name? In my case, I'm testing this using the Defender for Cloud built-in policy for K8s.

If that's not workable, then we can look into moving the initialization logic requiring those privileges to a CNI plugin so that each individual pod in the mesh doesn't have to have a container with such high privileges

This is a good idea if possible because the end goal is not to exclude but to be in compliance. In my case, we are using Kubenet and not CNI, because of networking constraints, so it sounds like even if you implement your suggestion it will not be compatible with Kubenet, right?

Thank you for your help and support.

I'm looking to work with the Az Policy team for OSM, but I want to confirm a statement here. As @keithmattix mentioned, the only container with root is the init container. Once the init container programs the IP tables, it goes away. Are you saying the Az Policy is still flagging OSM workloads as running with privileges? I need to check when the next time around the policy inspects the pods privileges, but I would expect it should become compliant at next check.

phillipgibson avatar Jun 23 '22 15:06 phillipgibson

Can your security policies have an exception list based on container name? Only a single container (the initcontainer) requires those privileges.

This initContainers always have the same name? In my case, I'm testing this using the Defender for Cloud built-in policy for K8s.

If that's not workable, then we can look into moving the initialization logic requiring those privileges to a CNI plugin so that each individual pod in the mesh doesn't have to have a container with such high privileges

This is a good idea if possible because the end goal is not to exclude but to be in compliance. In my case, we are using Kubenet and not CNI, because of networking constraints, so it sounds like even if you implement your suggestion it will not be compatible with Kubenet, right? Thank you for your help and support.

I'm looking to work with the Az Policy team for OSM, but I want to confirm a statement here. As @keithmattix mentioned, the only container with root is the init container. Once the init container programs the IP tables, it goes away. Are you saying the Az Policy is still flagging OSM workloads as running with privileges? I need to check when the next time around the policy inspects the pods privileges, but I would expect it should become compliant at next check.

Hi Philip,

All our containers have to run with the following security context as the minimum hardening.

      securityContext:
       runAsNonRoot: true
       runAsUser: 1000
       capabilities:
          drop: [
            NET_RAW
          ]

In some cases, we are also enforcing to only allow container images coming from specific container registries where we can scan for vulnerabilities and address them.

Basically, we have the Azure Defender for Cloud Policy applied, and we are using and applying the recommendations. If needed I can export the policy Initiative.

Thanks

granwizzard avatar Jun 27 '22 09:06 granwizzard

Hey @keithmattix @phillipgibson

As explained earlier, the OSM and the policy addons are mutually exclusive. Trying to inject anything with OSM is denied by the policy. Easy to test: just enable Azure Policy addon on your cluster. Fine tuning one of the policies (allowing NET ADMIN and NET RAW) should do the trick but you have to do that for every namespace that is part of the mesh. This results in a setup that does not respect the least privilege principle.

stephaneey avatar Jun 27 '22 13:06 stephaneey

Hey @keithmattix @phillipgibson @granwizzard

Just to let you know that I had a discussion with fellow experts on this. It's a no-brainer that CNI remains the best way to avoid granting elevated RBAC permissions to containers but Azure Policy can be fine tuned to exclude OSM-INIT & OSM-ENVOY containers from the evaluation. I had overlooked it because the built-in initiative does not allow you to do this. If you create your own, you can get fine-grainer exclusions. It's still not perfect but there is a way to go. By the way, I think it's better to exclude the container images instead of the container names as they contain the full image path. So, defining this:

excludedImages:
    - mcr.microsoft.com/oss/envoyproxy/envoy:v1.19.1
    - mcr.microsoft.com/oss/openservicemesh/init:v1.0.0

in your own initiative, lets OSM's injection and initialization work. Of course, this means that the values need to be updated whenever you upgrade your OSM version.

That said, a CNI integration would also still be very intresting.

Best Regards

stephaneey avatar Jul 06 '22 17:07 stephaneey

We'll work on getting the documentation updated with this info for the time being.

phillipgibson avatar Jul 06 '22 19:07 phillipgibson

Hi @stephaneey,

I'm already investigating the workaround, now I'm only receiving a denied message, and is related to the securityContext where you need to pass "MustRunAsNonRoot" on the osm-init container in the manifest.

Thanks.

granwizzard avatar Jul 07 '22 08:07 granwizzard

Since we have a workaround for Azure Policy and CNI work is being tracked in #1610, I'm going to close this

keithmattix avatar Sep 07 '22 19:09 keithmattix