Configuring Knative to use a local gateway in another namespace
(sorry, cross-post with knative-sandbox/net-istio#985. I posted there before realising this is likely an operator issue)
Describe the bug
I am trying to use Knative Serving via the Knative Operator to make ksvcs available via a local gateway (not a service mesh). The Gateway I have is shared with other applications, and the workload pods for that gateway are in a separate namespace. I'm having trouble getting the local side of the gateway to work properly, specifically with the istio-proxy deployment deployed in a different namespace from serving/any ksvc. I wonder if there's a missing/misconfigured service that the net istio controller needs to generate.
Expected behavior I'd expect the operator to configure Gateways and services that allow for in-cluster communication.
To Reproduce I have:
- deployed istio in
namespace-1, including- an
image: istio/proxydeployment working as a gateway deployed with selectoristio: ingressgateway - an external
Gatewaypointing to the above proxy calledmy-gateway - a
svccalledistio-ingressgateway-workloadthat points at the above proxy via a selector
- an
- deployed the Knative Operator
- created a
KnativeServingobject innamespace/knative-servingusing:
apiVersion: v1
kind: Namespace
metadata:
name: knative-serving
---
apiVersion: v1
kind: Namespace
metadata:
name: knative-serving
---
apiVersion: operator.knative.dev/v1beta1
kind: KnativeServing
metadata:
name: knative-serving-app
namespace: knative-serving
spec:
version: "1.7.1"
config:
istio:
gateway.namespace-1.my-gateway: "istio-ingressgateway-workload.namespace-1.svc.cluster.local"
local-gateway.knative-serving.knative-local-gateway: "knative-local-gateway.knative-serving.svc.cluster.local"
domain:
10.64.140.43.nip.io: ""
Once I have these, I deploy a cloudevents player with:
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: cloudevents-player-1
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/minScale: "1"
spec:
containers:
- image: quay.io/ruben/cloudevents-player:0.2
Once up, I can connect to the ksvc through the external gateway no problem using:
curl -v http://cloudevents-player-1.namespace-1.10.64.140.43.nip.io -H "Content-Type: application/json" -H "Ce-Id: foo" -H "Ce-Specversion: 1.0" -H "Ce-Type: dev.example.events" -H "Ce-Source: curl-source" -d '{"msg":"Can you hear me?"}'
which yields a 202 Accepted. However, if I try the local ingress, I get connection refused:
curl -v http://cloudevents-player-1.namespace-1.svc.cluster.local -H "Content-Type: application/json" -H "Ce-Id: foo" -H "Ce-Specversion: 1.0" -H "Ce-Type: dev.example.events" -H "Ce-Source: curl-source" -d '{"msg":"Can you hear me?"}'
which yields a Failed to connect ... port 80: Connection refused.
Knative release version 1.1 and 1.7
Additional context
I think this is because the svc/knative-local-gateway which is generated by the istio controller tries to point to the proxy via a selector, but the proxy is in a different namespace. We see the service (truncated a bit):
apiVersion: v1
kind: Service
metadata:
labels:
name: knative-local-gateway
namespace: knative-serving
spec:
ports:
- name: http2
port: 80
protocol: TCP
targetPort: 8081
selector:
istio: ingressgateway
type: ClusterIP
which is in namespace/knative-serving and tries to select pods by selector: {istio: ingressgateway}. Problem is that there is no istio: ingressgateway in namespace/knative-serving, it is instead deployed in namespace/namespace-1.
Am I configuring this incorrectly? I don't see any other settings that might change this. Maybe the intent is for me to point the local gateway at my own service, but then I don't understand why knative would create a svc/knative-local-gateway at all. The existing svc/istio-ingressgateway-workload that I have would be unsuitable as it is forwarding port 80 to 8080 (eg: the external route through the gateway). I'd need another svc to pass to port 8081 (like the svc/knative-local-gateway does).
If everything is configured correctly, I think that svc/knative-local-gateway needs to either be created in the namespace of the istio proxy pod, or be an ExternalName svc that resolves to a service in the istio proxy pod's namespace and points at the istio proxy pod.
Clarifying one thing, by default Istio's Gateway objects do work across namespaces (Gateway in namespace-1 can refer by selector to a deployment of image: istio/proxy in namespace-2), so the Gateway objects themselves I think are fine above. The problem is the service that serves as the entry way into the cluster-local gateway - that service cannot bridge the namespace boundaries by itself.
After some more tweaking, I found that setting:
local-gateway.knative-serving.knative-local-gateway: "knative-local-gateway.namespace-1.svc.cluster.local"
resulted in:
- creation of
gateway/knative-ingress-gatewayandgateway/knative-local-gateway. These seem to always be created, regardless of whether they're cited in the config, and control pods named byselector: {istio: ingressgateway}. They open ports 80 and 8081, respectively. - serving using the
gateway/knative-local-gatewayfrom the previous bullet, because it is cited in the config - the creation of
svc/knative-local-gatewayinnamespace/namespace-1. This service forwards traffic toselector: {istio: ingressgateway}port 8081. Since this svc is located in the namespace of the gateway pods it points to, it actually works out of the box as I hoped
Is this the expected behaviour? Maybe this is mainly a documentation problem, as it didn't feel clear this would happen and I only found it by accident.
Oddly, if I change the config to:
local-gateway.namespace-1.knative-local-gateway: "knative-local-gateway.namespace-1.svc.cluster.local"
it then creates the svc/knative-local-gateway in the namespace/istio-system, which seems to be a recurrance of knative/operator#565. Maybe what I'm describing is in the domain of the operator rather than net-istio? I'm not sure
This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.
This issue or pull request is stale because it has been open for 90 days with no activity.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten - Close this issue or PR with
/close
/lifecycle stale
This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.