It is possible to bypass the queue-proxy
/area networking
What version of Knative?
v0.11.0-125-gb8f7090cc
Expected Behavior
From within the cluster, it shouldn't be possible to connect directly to the user container and bypass the queue proxy for a revision.
I would expect that this would be prohibited, possibly via network policy.
Actual Behavior
Port 8080 of the user container is exposed and available
Note: this only works if the revision is scaled to 1 or more instances already
Steps to Reproduce the Problem
Deploy a knative service ("webapp" in my example)
get the PodIP
% kubectl get pods/webapp-mrpn8-deployment-6559dcff9b-c2pxx -oyaml | grep "podIP:"
podIP: 10.36.0.28
Able to (from on cluster) connect directly to port 8080 on that pod
# curl http://10.36.0.28:8080
<html>
<head>
<title>Hello there...</title>
</head>
Something like the following NetworkPolicy might work (untested):
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: knative-serving-user-container-policy
namespace: default
spec:
podSelector:
matchExpressions:
- key: serving.knative.dev/revision
operator: Exists
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchExpression:
- key: serving.knative.dev/release
operator: Exists
podSelector:
matchLabels:
app: activator
ports:
- protocol: TCP
port: queue-port
(and a similar NetworkPolicy to allow traffic from the HTTP load-balancers, which would be network stack specific)
/kind good-first-issue
Note that NetworkPolicies are additive, so it should be possible to apply the above NetworkPolicy as well as a NetworkPolicy that allows it.
Note that there are no good namespace labels for selection for at least the default suggested Istio installation....
Ref: https://kubernetes.io/docs/concepts/services-networking/network-policies/
Is it something we want to enforce? We could simply document it: tell customer to not open a socket on all interfaces but just on loopback. It could be in the spec along with listening on the port provided by the PORT env variable.
i.e.
http.ListenAndServe("localhost:8080", nil)
instead of
http.ListenAndServe(":8080", nil)
And we should start with our samples: https://github.com/knative/docs/blob/master/docs/serving/samples/hello-world/helloworld-go/helloworld.go#L30.
Code I used to validate what I described above: https://github.com/JRBANCEL/Experimental/tree/master/IsolateContainerInsidePod
Requiring all containers to be conscious of network security seems like a problem, particularly if users end up reusing common containers (e.g. hugo for serving a website with the website layer added to a base hugo image).
/assign mpetason
@evankanderson I'm going to try this issue and see if I can resolve it. Would the first steps be to get this working for default then discuss the best way to select namespaces that should be covered? If we weren't using additional namespaces after installation then we could require it in the config, but users will probably create additional namespaces to use besides default. I'm getting up to speed on how some of this works and then will be seeing if I can get it to block in default first (should be easy with the examples) then maybe push a PR to discuss additional namespaces.
Is this only surfaced because the user-port is part of the containerPorts list in K8s? What happens if we remove it from that list during deployment? Is the port still surfaced?
Tried it, even if we don't surface it, 8080 is curlable.
Tried it, even if we don't surface it, 8080 is curlable.
Yes, it doesn't matter. Only listening to loopback prevents external connections to be established. See my comment above.
well, or a network rule, as Evan mentioned (definitely for meshes).
I did some testing but wasn't able to find where the traffic comes from after it gets cut over from the activator. I tested out a few different sources, but the main one I tested was istio-ingressgateway.
Here's the Network Policy that I thought would work:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: knative-serving-user-container-policy-lb
namespace: default
spec:
podSelector:
matchExpressions:
- key: serving.knative.dev/revision
operator: Exists
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: istio-system
- podSelector:
matchLabels:
app: istio-ingressgateway
ports:
- protocol: TCP
port: queue-port
Activator Traffic
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: knative-serving-user-container-policy
namespace: default
spec:
podSelector:
matchExpressions:
- key: serving.knative.dev/revision
operator: Exists
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchExpressions:
- key: serving.knative.dev/release
operator: Exists
podSelector:
matchLabels:
app: activator
ports:
- protocol: TCP
port: queue-port
Test steps - changed target-burst-capacity to 0 so the activator would be removed from the path after there was a revision. (Victor recommended changing minScale=1 instead) Applied both network policies. First hit works like it should because of the activator networkpolicy, then the pod scales to 1 and traffic fails. I was thinking that the istio networkpolicy should allow traffic, but it fails with "upstream connect error or disconnect/reset before headers. reset reason: connection failure".
I checked a few different logs but didn't see anything useful. I might not be looking in the right place.
Any ideas on where traffic is coming from after the revisions scale?
To be precise: Victor recommended changing minScale=1 instead: in addition to tbc=0 apply minScale=1
I asked internally and had a response that might work out as well -
"for this use case wouldn't just allow-from-all to queuePort work - i.e. no need for a from: .. labels at all? that'd block all the other ports, I think, and let queuePort only through - iiuc we're ok with direct access to queuePort from anywhere so that seems to do what we want?"
Tested this out using a policy that only specifies allowable ports, but allows traffic from anywhere.
Looks like it works when hitting the endpoint from a browser.
It fails when trying to curl from another pod in the environment using the podIP and 8080.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: knative-serving-user-container-policy
namespace: default
spec:
podSelector:
matchExpressions:
- key: serving.knative.dev/revision
operator: Exists
policyTypes:
- Ingress
ingress:
- from:
ports:
- protocol: TCP
port: queue-port
Can we try adding this and run our integration tests? If things work fine, we can add an additional test to hit the queue-proxy directly and confirm failure.
Since this allows all traffic to the queue-port and only searches for knative revisions, should this be something that is setup during install(config file) instead of for each revision?
Yes, this looks like a one-time setup during install, so no need to do it per Revision. Which is also great because no code change, and users that dislike the restriction could easily remove it themselves (and potentially replaced with something else to their liking).
The networkpolicy will be applied for default but it doesn't cover other namespaces. We have a couple options that I can think of, please feel free to add any others I might not have thought of.
-
Use this during the original setup and cover the default namespace. Add documentation explaining how this works and recommend that the user applies it to other namespaces.
-
Apply the networkpolicy per revision so that it will cover all namespaces. Probably have to turn this into an option eventually in case end users want to turn it off.
Looking for feedback/ideas about how we want this to work.
Another option - one NetworkPolicy per namespace. When creating a new revision check if a networkpolicy is setup for the namespace, if there isn't one then create a new one.
I'll be working on the one network policy per namespace idea for now. The PR is WIP currently.
A big -1 on adding default NetworkPolicy objects out of the box.
In my experience, it's quite hard to come up with an appropriate NetworkPolicy out of the box that won't have unwanted side-effects. Today, with us shipping no NetworkPolicy objects, users can install their own if they want a Knative Service to be isolated from specific other namespaces, specific other pods, or so on.
In the linked PR #7700, for example, it adds a NetworkPolicy that allows any traffic from any pod or any namespace to the queue-proxy port of every Knative Revision pod. So, with that in place, how would a user now decide that certain Knative Revision pods should only be accessible from specific other pods? Specific other namespaces? By trying to lock down thing so you can only hit the queue-proxy (a relatively minor problem), we've opened up a much bigger can of worms by impact the ability for users to actually restrict traffic to their Knative Services as needed.
@bbrowning I agree, it will be more difficult to control traffic. Maybe we can get more feedback on how we want to approach this. It was suggested that we document this but also that maybe we are proactive and force it.
If we document it then we could have a sample network policy that will block all traffic except for what is destined for queue-proxy.
I'm open to anything. I'm looking for issues to work on to help :P
Perhaps @mikehelmick or someone else can elaborate on why a user shouldn't be able to bypass the queue-proxy. Is it a concern around bad actors? Accidental misuse of Serving? Conformance and exposed API of Serving? Is there a legitimate reason someone may want to bypass the queue-proxy? That kind of information would help determine whether this is something we can ignore, something we should document and tell users why they may want to configure their cluster in this way, or something we should enforce and need to find a solution with fewer side-effects than a NetworkPolicy like this.
The main problem is that we can't properly scale if there's traffic going past QP.
FWIW, knowing that everything goes thru the QP is helpful to ensure that we can do pre & post message processing if needed w/o asking the user to do anything special in their code - or even be aware of it. This gives us a place to add hooks to do some additional infrastructure logic (ie. magic) for the user in our attempt to make their life easier.
I agree that for scaling purposes, the primary traffic needs to flow through the queue-proxy. And, it does today. A user has to intentionally try to bypass the internal or external URL of their Knative Service and directly hit a Knative Revision pod to bypass the queue-proxy.
The open question is whether we should prohibit users from being able to do that. You have to specifically go out of your way to bypass the queue-proxy and hit a Revision pod directly, so is it safe to assume someone doing that has a specific reason they want/need to in a specific circumstance? Or do we have a gap somewhere where users are accidentally sending traffic directly to a Revision pod?
So what do we want to do about this? Should I add this to a serving meeting agenda to go over? It's not an urgent issue but it seems like there has been more discussion around it recently.
I went ahead and closed the PR I had open to address this issue as it doesn't seem like we have an agreement on what we want to do.
cc @igorbelianski