build icon indicating copy to clipboard operation
build copied to clipboard

Why is Istio's sidecar disabled in build pods?

Open adamglt opened this issue 5 years ago • 11 comments

Running builds without a sidecar negates all Istio features - mTLS, RBAC, Egress control, etc.

Looks like the always-off setting was added here: https://github.com/knative/build/pull/74, but the issue/PR doesn't specify why.

Does the sidecar break builds for some reason we're missing?
If not, shouldn't this be configurable?

see https://github.com/knative/build/issues/369#issuecomment-425496280

@mattmoor @ian-mi

/area API /kind dev

adamglt avatar Sep 20 '18 05:09 adamglt

@adamglt :

One reason I can think would be to avoid pod startup delays.

Build pods are not using any istio feature set either so there is no benefit yet. What would be your use case for adding sidecar?

shashwathi avatar Sep 20 '18 18:09 shashwathi

My use case is the same as any Istio controlled pod - traffic control, observability, and security. For example:

  • Controlling API-based traffic with RBAC (builds may call other infra service)
  • Dynamically routing data pulls (e.g. pull artifacts from a cache)
  • mTLS to those services, with the attached ServiceAccount's SPIFFE-ID

adamglt avatar Sep 20 '18 20:09 adamglt

/assign @mattmoor @ian-mi

adamglt avatar Sep 27 '18 04:09 adamglt

Looking at the code - the reason seems to be that all build steps run as init containers, where istio-proxy is a "real" container.

I guess this is a must to let steps run one-by-one in a specific order.
This seems to mean running builds with actual Istio integration is pretty much impossible..

Is there any intention to move from init containers to something else? (some pseudo-pause container?)

adamglt avatar Sep 28 '18 16:09 adamglt

@adamglt That's not something we've considered so far, but we could if using init containers blocks some important functionality.

imjasonh avatar Sep 28 '18 16:09 imjasonh

@ImJasonH The main functionality we're interested is running builds for semi-trusted tenants.
Basically the build-environment version of a multi-tenant cluster.

For example, we might whitelist a certain tenant to access service X during builds, but not another tenant. Actual access control can go down to specific builds.
What we currently do is control external calls with Istio's RBAC, which is relatively easy.
NetworkPolicy capabilities are a lot weaker in comparison.

I've just seen this issue discussing something similar: https://github.com/knative/build/issues/32 but I'm a bit surprised there isn't more interest.
What's stopping builds from accessing random endpoints in the cluster / over the internet? Is everyone running "fully trusted" builds?

adamglt avatar Sep 28 '18 17:09 adamglt

We're definitely interested in making Build suitable in a multi-tenant environment, and it should definitely be possible to limit what a build can reach, and what can reach the build. It's not an area we've spent a lot of time on so far, so your use cases are going to be really helpful in polishing this. #32 doesn't have a lot of answers so far, mostly just questions and ideas.

It would be great not to require Istio at all, and still be able to ensure a secure trustworthy multi-tenant build environment. I admit I'm not very familiar with the limits of k8s builtin limitations, and how Istio's are better, can you elaborate how network policies (and RBAC?) are weaker?

imjasonh avatar Sep 28 '18 17:09 imjasonh

Sure, it's a big topic but here's the main points I can think of:

  • Network Access Control

    • k8s does this with NetworkPolicy (not really k8s, but CNI implementations like Calico). NetworkPolicy is L4 - so it's stricly "this set of pods can reach that set of pods/namespace/cidr".
    • Istio does this with a combination of a few resources, but mostly Istio RBAC (I think it's officially called Istio Authorization). Istio's rules are L7, so a much more granular control is allowed.
    • k8s NetworkPolicies use selectors to identify subjects/targets, while Istio allows ServiceAccounts+SPIFFE-ID, which is essentially strong service identities (x509 certs).
  • Authorization (RBAC)

    • k8s RBAC deals with workload-to-k8s access control (e.g. service A can list pods).
    • Istio RBAC deals with workload-to-workload access control (e.g. service A can call service B at endpoint X).
    • Fully controlled and isolated builds require usage of both workload-to-k8s and workload-to-workload authorization.
  • Authentication

    • Istio allows for transparent (proxy-to-proxy) mTLS with strong service identities.
    • k8s doesn't deal with auth at all, it's the user's responsibility to inject secrets and call secure APIs, etc.

In general, if you give up on the granularity Istio gives you, it's probably possible to restrict build network access using nothing but NetworkPolicy.

However, I think that organizations adopting Knative (and Istio in general) will probably drop a lot of L4 and in-app-TLS functionality for Istio's L7+mTLS, so having a completely different set of authentication/authorization methods for everything running under knative-build doesn't make a lot of sense.

Beyond the issue of "should Istio be enabled", there's the issue of "can Istio be enabled".
AFAIK there's no way to force containers to run sequentially in a Pod unless they're init containers, and init containers also have to run to completion before the next init container, or any "real" container.
This obviously clashes with istio-proxy, which is just a regular sidecar injected into the Pod.

Seems to me like the only real ways to get around this is:

  • Have some kind of init-container-like functionality for regular containers in k8s (unlikely).
  • Run Istio as a node-level daemon instead of a sidecar (possibly with VMs, I don't think it is in k8s - and either way, you lose a lot of functionality this way).
  • Split up steps to Pods, instead of multiple containers in a Pod. This can probably work, but it introduces a bunch of other issues - like writing coordination/aggregation logic, sharing workspaces (volumes) across Pods that run in random nodes, etc..

Hopefully I'm missing some other magic solution - maybe some of the Istio people can weigh in. @rshriram maybe?

adamglt avatar Sep 29 '18 07:09 adamglt

(haven't caught up on this thread, just chiming in with historical context)

/unassign @mattmoor /unassign @ian-mi /assign @tcnghia

@adamglt The history here is that when we first started enabling sidecars it blocked all egress traffic, which is a problem when the first thing you do is a git clone https://github.com/.... Given this limitation of Istio, this is likely the right setting for 99+% of builds.

That said, I don't want serving to need to know about these annotations, let alone build. If Istio had a sensible default mode that let it inject sidecars without being disruptive to the software its instrumenting, then we wouldn't need this.

If there is such a setting, then I'd love to know about it. I am by no means an expert on Istio, but perhaps you could join our Networking WG (led by @tcnghia, who is!) on Thursday mornings to discuss?

mattmoor avatar Oct 02 '18 14:10 mattmoor

It looks like with this change https://github.com/istio/istio/pull/6406 in Istio we may be able to remove the annotations relating to sidecar injection in our Pods. Instead we should leave all injection policy choices to the users to fit into their use case.

That will also mean that we need to document how these can be applied and caveats (like configuring egress exceptions) so that users can make the best choice for themselves.

tcnghia avatar Oct 02 '18 18:10 tcnghia

@mattmoor hey, thanks for jumping in.

About istio's auto-injection, I think Istio just ignores namespaces without the istio-injection=enabled label - so that should be good enough (but only if people are running builds in build-dedicated namespaces).

About the git clones failing, yeah that makes sense. I mean, in our case we'd like them to fail (we'd whitelist specific endpoints), but that's not really a "sane default".

Anyway, as I wrote above I just realized builds were never really running or using the Istio sidecar, since it's a regular container, and all build steps are init-containers.
What I believe you've experienced is having Istio inject istio-init - the init-container that sets up the iptables rules forwarding all traffic to istio-proxy (the sidecar), but the sidecar was never there, so every call should have probably failed.

In any case, this is a pretty tricky situation. The whole model of sidecars vs init-containers is clashing pretty hard - so one of [knative, istio, k8s] has to make a pretty big change to make it all fit together.

adamglt avatar Oct 02 '18 19:10 adamglt