operator Configuring Knative to use a local gateway in another namespace

(sorry, cross-post with knative-sandbox/net-istio#985. I posted there before realising this is likely an operator issue)

Describe the bug I am trying to use Knative Serving via the Knative Operator to make ksvcs available via a local gateway (not a service mesh). The Gateway I have is shared with other applications, and the workload pods for that gateway are in a separate namespace. I'm having trouble getting the local side of the gateway to work properly, specifically with the istio-proxy deployment deployed in a different namespace from serving/any ksvc. I wonder if there's a missing/misconfigured service that the net istio controller needs to generate.

Expected behavior I'd expect the operator to configure Gateways and services that allow for in-cluster communication.

To Reproduce I have:

deployed istio in namespace-1, including
- an image: istio/proxy deployment working as a gateway deployed with selector istio: ingressgateway
- an external Gateway pointing to the above proxy called my-gateway
- a svc called istio-ingressgateway-workload that points at the above proxy via a selector
deployed the Knative Operator
created a KnativeServing object in namespace/knative-serving using:

apiVersion: v1
kind: Namespace
metadata:
  name: knative-serving
---
apiVersion: v1
kind: Namespace
metadata:
  name: knative-serving
---
apiVersion: operator.knative.dev/v1beta1
kind: KnativeServing
metadata:
  name: knative-serving-app
  namespace: knative-serving
spec:
  version: "1.7.1"
  config:
    istio:
      gateway.namespace-1.my-gateway: "istio-ingressgateway-workload.namespace-1.svc.cluster.local"
      local-gateway.knative-serving.knative-local-gateway: "knative-local-gateway.knative-serving.svc.cluster.local"
    domain:
      10.64.140.43.nip.io: ""

Once I have these, I deploy a cloudevents player with:

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: cloudevents-player-1
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/minScale: "1"
    spec:
      containers:
        - image: quay.io/ruben/cloudevents-player:0.2

Once up, I can connect to the ksvc through the external gateway no problem using:

curl -v http://cloudevents-player-1.namespace-1.10.64.140.43.nip.io -H "Content-Type: application/json" -H "Ce-Id: foo" -H "Ce-Specversion: 1.0" -H "Ce-Type: dev.example.events" -H "Ce-Source: curl-source" -d '{"msg":"Can you hear me?"}'

which yields a 202 Accepted. However, if I try the local ingress, I get connection refused:

curl -v http://cloudevents-player-1.namespace-1.svc.cluster.local -H "Content-Type: application/json" -H "Ce-Id: foo" -H "Ce-Specversion: 1.0" -H "Ce-Type: dev.example.events" -H "Ce-Source: curl-source" -d '{"msg":"Can you hear me?"}'

which yields a Failed to connect ... port 80: Connection refused.

Knative release version 1.1 and 1.7

Additional context I think this is because the svc/knative-local-gateway which is generated by the istio controller tries to point to the proxy via a selector, but the proxy is in a different namespace. We see the service (truncated a bit):

apiVersion: v1
kind: Service
metadata:
  labels:
  name: knative-local-gateway
  namespace: knative-serving
spec:
  ports:
  - name: http2
    port: 80
    protocol: TCP
    targetPort: 8081
  selector:
    istio: ingressgateway
  type: ClusterIP

which is in namespace/knative-serving and tries to select pods by selector: {istio: ingressgateway}. Problem is that there is no istio: ingressgateway in namespace/knative-serving, it is instead deployed in namespace/namespace-1.

Am I configuring this incorrectly? I don't see any other settings that might change this. Maybe the intent is for me to point the local gateway at my own service, but then I don't understand why knative would create a svc/knative-local-gateway at all. The existing svc/istio-ingressgateway-workload that I have would be unsuitable as it is forwarding port 80 to 8080 (eg: the external route through the gateway). I'd need another svc to pass to port 8081 (like the svc/knative-local-gateway does).

If everything is configured correctly, I think that svc/knative-local-gateway needs to either be created in the namespace of the istio proxy pod, or be an ExternalName svc that resolves to a service in the istio proxy pod's namespace and points at the istio proxy pod.

Sep 27 '22 18:09 ca-scribner

Clarifying one thing, by default Istio's Gateway objects do work across namespaces (Gateway in namespace-1 can refer by selector to a deployment of image: istio/proxy in namespace-2), so the Gateway objects themselves I think are fine above. The problem is the service that serves as the entry way into the cluster-local gateway - that service cannot bridge the namespace boundaries by itself.

Sep 27 '22 18:09 ca-scribner

After some more tweaking, I found that setting:

local-gateway.knative-serving.knative-local-gateway: "knative-local-gateway.namespace-1.svc.cluster.local"

resulted in:

creation of gateway/knative-ingress-gateway and gateway/knative-local-gateway. These seem to always be created, regardless of whether they're cited in the config, and control pods named by selector: {istio: ingressgateway}. They open ports 80 and 8081, respectively.
serving using the gateway/knative-local-gateway from the previous bullet, because it is cited in the config
the creation of svc/knative-local-gateway in namespace/namespace-1. This service forwards traffic to selector: {istio: ingressgateway} port 8081. Since this svc is located in the namespace of the gateway pods it points to, it actually works out of the box as I hoped

Is this the expected behaviour? Maybe this is mainly a documentation problem, as it didn't feel clear this would happen and I only found it by accident.

Oddly, if I change the config to:

local-gateway.namespace-1.knative-local-gateway: "knative-local-gateway.namespace-1.svc.cluster.local"

it then creates the svc/knative-local-gateway in the namespace/istio-system, which seems to be a recurrance of knative/operator#565. Maybe what I'm describing is in the domain of the operator rather than net-istio? I'm not sure

Sep 27 '22 18:09 ca-scribner

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

Jan 07 '23 01:01 github-actions[bot]

This issue or pull request is stale because it has been open for 90 days with no activity.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close

/lifecycle stale

Feb 06 '23 02:02 knative-prow-robot

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

May 08 '23 01:05 github-actions[bot]