serving It is possible to bypass the queue-proxy

/area networking

What version of Knative?

v0.11.0-125-gb8f7090cc

Expected Behavior

From within the cluster, it shouldn't be possible to connect directly to the user container and bypass the queue proxy for a revision.

I would expect that this would be prohibited, possibly via network policy.

Actual Behavior

Port 8080 of the user container is exposed and available

Note: this only works if the revision is scaled to 1 or more instances already

Steps to Reproduce the Problem

Deploy a knative service ("webapp" in my example)

get the PodIP

% kubectl get pods/webapp-mrpn8-deployment-6559dcff9b-c2pxx -oyaml  | grep "podIP:"
  podIP: 10.36.0.28

Able to (from on cluster) connect directly to port 8080 on that pod

# curl http://10.36.0.28:8080        
<html>
<head>
  <title>Hello there...</title>
</head>

Jan 13 '20 22:01 mikehelmick

Something like the following NetworkPolicy might work (untested):

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: knative-serving-user-container-policy
  namespace: default
spec:
  podSelector:
    matchExpressions:
      - key: serving.knative.dev/revision
        operator: Exists
  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchExpression:
        - key: serving.knative.dev/release
          operator: Exists
      podSelector:
        matchLabels:
          app: activator
    ports:
    - protocol: TCP
      port: queue-port

(and a similar NetworkPolicy to allow traffic from the HTTP load-balancers, which would be network stack specific)

/kind good-first-issue

Jan 13 '20 22:01 evankanderson

Note that NetworkPolicies are additive, so it should be possible to apply the above NetworkPolicy as well as a NetworkPolicy that allows it.

Note that there are no good namespace labels for selection for at least the default suggested Istio installation....

Jan 13 '20 22:01 evankanderson

Ref: https://kubernetes.io/docs/concepts/services-networking/network-policies/

Jan 13 '20 22:01 evankanderson

Is it something we want to enforce? We could simply document it: tell customer to not open a socket on all interfaces but just on loopback. It could be in the spec along with listening on the port provided by the PORT env variable.

i.e.

http.ListenAndServe("localhost:8080", nil)

instead of

http.ListenAndServe(":8080", nil)

And we should start with our samples: https://github.com/knative/docs/blob/master/docs/serving/samples/hello-world/helloworld-go/helloworld.go#L30.

Code I used to validate what I described above: https://github.com/JRBANCEL/Experimental/tree/master/IsolateContainerInsidePod

Jan 16 '20 00:01 JRBANCEL

Requiring all containers to be conscious of network security seems like a problem, particularly if users end up reusing common containers (e.g. hugo for serving a website with the website layer added to a base hugo image).

Feb 03 '20 22:02 evankanderson

/assign mpetason

Mar 10 '20 15:03 mpetason

@evankanderson I'm going to try this issue and see if I can resolve it. Would the first steps be to get this working for default then discuss the best way to select namespaces that should be covered? If we weren't using additional namespaces after installation then we could require it in the config, but users will probably create additional namespaces to use besides default. I'm getting up to speed on how some of this works and then will be seeing if I can get it to block in default first (should be easy with the examples) then maybe push a PR to discuss additional namespaces.

Mar 16 '20 15:03 mpetason

Is this only surfaced because the user-port is part of the containerPorts list in K8s? What happens if we remove it from that list during deployment? Is the port still surfaced?

Mar 26 '20 09:03 markusthoemmes

Tried it, even if we don't surface it, 8080 is curlable.

Mar 26 '20 15:03 markusthoemmes

Tried it, even if we don't surface it, 8080 is curlable.

Yes, it doesn't matter. Only listening to loopback prevents external connections to be established. See my comment above.

Mar 26 '20 17:03 JRBANCEL

well, or a network rule, as Evan mentioned (definitely for meshes).

Mar 26 '20 17:03 vagababov

I did some testing but wasn't able to find where the traffic comes from after it gets cut over from the activator. I tested out a few different sources, but the main one I tested was istio-ingressgateway.

Here's the Network Policy that I thought would work:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: knative-serving-user-container-policy-lb
  namespace: default
spec:
  podSelector:
    matchExpressions:
      - key: serving.knative.dev/revision
        operator: Exists
  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: istio-system
    - podSelector:
        matchLabels:
          app: istio-ingressgateway
    ports:
    - protocol: TCP
      port: queue-port

Activator Traffic

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: knative-serving-user-container-policy
  namespace: default
spec:
  podSelector:
    matchExpressions:
      - key: serving.knative.dev/revision
        operator: Exists
  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchExpressions:
        - key: serving.knative.dev/release
          operator: Exists
      podSelector:
        matchLabels:
          app: activator
    ports:
    - protocol: TCP
      port: queue-port

Test steps - changed target-burst-capacity to 0 so the activator would be removed from the path after there was a revision. (Victor recommended changing minScale=1 instead) Applied both network policies. First hit works like it should because of the activator networkpolicy, then the pod scales to 1 and traffic fails. I was thinking that the istio networkpolicy should allow traffic, but it fails with "upstream connect error or disconnect/reset before headers. reset reason: connection failure".

I checked a few different logs but didn't see anything useful. I might not be looking in the right place.

Any ideas on where traffic is coming from after the revisions scale?

Apr 02 '20 19:04 mpetason

To be precise: Victor recommended changing minScale=1 instead: in addition to tbc=0 apply minScale=1

Apr 02 '20 19:04 vagababov

I asked internally and had a response that might work out as well -

"for this use case wouldn't just allow-from-all to queuePort work - i.e. no need for a from: .. labels at all? that'd block all the other ports, I think, and let queuePort only through - iiuc we're ok with direct access to queuePort from anywhere so that seems to do what we want?"

Apr 12 '20 16:04 mpetason

Tested this out using a policy that only specifies allowable ports, but allows traffic from anywhere.

Looks like it works when hitting the endpoint from a browser.

It fails when trying to curl from another pod in the environment using the podIP and 8080.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: knative-serving-user-container-policy
  namespace: default
spec:
  podSelector:
    matchExpressions:
      - key: serving.knative.dev/revision
        operator: Exists
  policyTypes:
  - Ingress
  ingress:
  - from:
    ports:
    - protocol: TCP
      port: queue-port

Apr 22 '20 16:04 mpetason

Can we try adding this and run our integration tests? If things work fine, we can add an additional test to hit the queue-proxy directly and confirm failure.

Apr 22 '20 18:04 tcnghia

Since this allows all traffic to the queue-port and only searches for knative revisions, should this be something that is setup during install(config file) instead of for each revision?

Apr 23 '20 16:04 mpetason

Yes, this looks like a one-time setup during install, so no need to do it per Revision. Which is also great because no code change, and users that dislike the restriction could easily remove it themselves (and potentially replaced with something else to their liking).

Apr 23 '20 16:04 tcnghia

The networkpolicy will be applied for default but it doesn't cover other namespaces. We have a couple options that I can think of, please feel free to add any others I might not have thought of.

Use this during the original setup and cover the default namespace. Add documentation explaining how this works and recommend that the user applies it to other namespaces.
Apply the networkpolicy per revision so that it will cover all namespaces. Probably have to turn this into an option eventually in case end users want to turn it off.

Looking for feedback/ideas about how we want this to work.

Apr 28 '20 14:04 mpetason

Another option - one NetworkPolicy per namespace. When creating a new revision check if a networkpolicy is setup for the namespace, if there isn't one then create a new one.

May 06 '20 17:05 mpetason

I'll be working on the one network policy per namespace idea for now. The PR is WIP currently.

May 18 '20 15:05 mpetason

A big -1 on adding default NetworkPolicy objects out of the box.

In my experience, it's quite hard to come up with an appropriate NetworkPolicy out of the box that won't have unwanted side-effects. Today, with us shipping no NetworkPolicy objects, users can install their own if they want a Knative Service to be isolated from specific other namespaces, specific other pods, or so on.

In the linked PR #7700, for example, it adds a NetworkPolicy that allows any traffic from any pod or any namespace to the queue-proxy port of every Knative Revision pod. So, with that in place, how would a user now decide that certain Knative Revision pods should only be accessible from specific other pods? Specific other namespaces? By trying to lock down thing so you can only hit the queue-proxy (a relatively minor problem), we've opened up a much bigger can of worms by impact the ability for users to actually restrict traffic to their Knative Services as needed.

May 18 '20 15:05 bbrowning

@bbrowning I agree, it will be more difficult to control traffic. Maybe we can get more feedback on how we want to approach this. It was suggested that we document this but also that maybe we are proactive and force it.

If we document it then we could have a sample network policy that will block all traffic except for what is destined for queue-proxy.

I'm open to anything. I'm looking for issues to work on to help :P

May 19 '20 15:05 mpetason

Perhaps @mikehelmick or someone else can elaborate on why a user shouldn't be able to bypass the queue-proxy. Is it a concern around bad actors? Accidental misuse of Serving? Conformance and exposed API of Serving? Is there a legitimate reason someone may want to bypass the queue-proxy? That kind of information would help determine whether this is something we can ignore, something we should document and tell users why they may want to configure their cluster in this way, or something we should enforce and need to find a solution with fewer side-effects than a NetworkPolicy like this.

May 20 '20 02:05 bbrowning

The main problem is that we can't properly scale if there's traffic going past QP.

May 20 '20 03:05 vagababov

FWIW, knowing that everything goes thru the QP is helpful to ensure that we can do pre & post message processing if needed w/o asking the user to do anything special in their code - or even be aware of it. This gives us a place to add hooks to do some additional infrastructure logic (ie. magic) for the user in our attempt to make their life easier.

May 20 '20 11:05 duglin

I agree that for scaling purposes, the primary traffic needs to flow through the queue-proxy. And, it does today. A user has to intentionally try to bypass the internal or external URL of their Knative Service and directly hit a Knative Revision pod to bypass the queue-proxy.

The open question is whether we should prohibit users from being able to do that. You have to specifically go out of your way to bypass the queue-proxy and hit a Revision pod directly, so is it safe to assume someone doing that has a specific reason they want/need to in a specific circumstance? Or do we have a gap somewhere where users are accidentally sending traffic directly to a Revision pod?

May 21 '20 17:05 bbrowning

So what do we want to do about this? Should I add this to a serving meeting agenda to go over? It's not an urgent issue but it seems like there has been more discussion around it recently.

May 27 '20 15:05 mpetason

I went ahead and closed the PR I had open to address this issue as it doesn't seem like we have an agreement on what we want to do.

Jul 10 '20 17:07 mpetason

cc @igorbelianski

Aug 22 '20 15:08 tcnghia