gloo icon indicating copy to clipboard operation
gloo copied to clipboard

Istio Compat: Routes to Services In Workspace with Service Isolation Enabled Fail Without Workaround

Open bgottfried91 opened this issue 2 years ago • 10 comments

Gloo Edge Version

1.12.x (beta)

Kubernetes Version

1.23.x

Describe the bug

#6195 and #6196 added support for full compatibility between a GE Gateway Proxy and services in the Mesh, but it turns out the support only works for cross-workspace traffic (e.g. from a gateways workspace to the bookinfo workspace) for workspaces that do not have Service Isolation enabled. If you do enable Service Isolation, any GE route that does not do a hostRewrite* will fail when attempting to access a service in another workspace that has Service Isolation enabled.

*You can work around this by setting a hostRewrite of servicename.namespace for each route. See additional context for why this isn't a good long-term solution.

Steps to reproduce the bug

  1. Deploy Gloo Mesh and register a cluster with it.
  2. Deploy Gloo Edge on the target cluster. Make sure to set global.istioIntegration.enableIstioSidecarOnGateway in the Helm values file and ensure that the Gloo Edge gateway proxy has a sidecar.
  3. Deploy an application on the target cluster that Edge will target (e.g. bookinfo )
  4. Create two workspaces on the registered cluster, one of which covers the Gloo Edge namespace and the other which covers the application namespace. Ensure that Service Isolation is enabled for the application workspace.
  5. Create a Virtual Service for Gloo Edge that targets an application in the application namespace (e.g. productpage)
  6. Try to reach the application through that VS on the Gloo Edge proxy URL. It should fail with an error upstream connect error or disconnect/reset before headers. reset reason: connection termination

Expected Behavior

When creating a route for a service that is in the Mesh and Gloo Edge is configured for compatibility with Gloo Mesh, you should not have to do any special configuration for the routes exposed by Gloo Edge beyond pointing them to the service upstreams.

Additional Context

While there is a workaround that can be used to avoid this in the short-term, this is a gap in the Istio compatibility of Edge with Mesh. Considering that even in the short-term, we have clients that are trying to use this integration in order to have access to Portal and in the long-term we may have clients that do not want to leave Edge but do want to add Gloo Mesh, it's worth looking into what it would take for Edge to do this without the workaround. Additionally, spoke with Neeraj about this briefly and he pointed out that doing this hostRewrite could cause issues for applications that are expecting the hostname sent to Edge from the external traffic. There are apparently ways to rewrite again in the receiving sidecar, but at that point we're getting into complex config to get around this. It seems like a better option to find some way to include the gateway host as a valid service target for any of the services in the mesh so that the appropriate target gets selected.

bgottfried91 avatar Jun 07 '22 20:06 bgottfried91

@kcbabo @chrisgaun Need to prioritize for a customer.

murphye avatar Jun 08 '22 20:06 murphye

In order for services to be accessed across workspaces in GME with serviceIsolation enabled in the application workspace, you need to export the service from the application workspace and import it into the gateway's workspace. That should generate the necessary resources to allow this communication.

Separately but related - if you have serviceIsolation enabled on the gateway workspace, you'll need to create a GME AccessPolicy to allow traffic to that gateway from non-mesh services.

Sodman avatar Jul 28 '22 16:07 Sodman

Apologies, I didn't clarify that in the reproduction steps, but in the test we did we had set up dual way importing-exporting (i.e. importing to and exporting from for both workspaces) of everything in the workspaces and still had this issue. The issue we ran into took a lot of digging (and Eitan and Yuval's assistance to debug) and seemed to be related to the hostname that the request came in on. My understanding of the issue was that the request came in with the hostname of the Edge gateway (whatever hostname was used to hit it from outside the cluster) and was routed based on an Edge VS - the request gets grabbed by Edge's sidecar, but it doesn't have that hostname in its list of services in the mesh (since it registered Edge using the servicename.namespace name internally) so it treats the request as going to a service outside the mesh and doesn't send it with mTLS. The request is then received by the server sidecar and since we have Service Isolation enabled, requests without mTLS are rejected by the server sidecar. I might be misunderstanding parts of what's happening, but can definitely confirm that the issue occurs even when we have broad importing/exporting set up between the workspaces.

bgottfried91 avatar Jul 28 '22 17:07 bgottfried91

@bgottfried91 what was the host rewrite you used as a workaround?

elcasteel avatar Jul 29 '22 14:07 elcasteel

Sorry for the delay! This is what my VS looked like with the workaround:

apiVersion: gateway.solo.io/v1
kind: VirtualService
metadata:
  name: prodpage
  namespace: gloo-system
spec:
  virtualHost:
    domains:
    - '*'
    routes:
    - matchers:
      - prefix: /
      options:
        hostRewrite: productpage.bookinfo
      routeAction:
        single:
          upstream:
            name: bookinfo-productpage-9080
            namespace: gloo-system

Note the hostRewrite to servicename.namespace

bgottfried91 avatar Jul 29 '22 21:07 bgottfried91

@elcasteel any luck reproducing the issue or the workaround? I can try and get an instance set up in AWS this afternoon for you to take a look at if not.

bgottfried91 avatar Aug 02 '22 13:08 bgottfried91

I reproduced the both the issue and the workaround.

elcasteel avatar Aug 02 '22 14:08 elcasteel

To fix this, we're going to automatically rewrite the host header in Gloo when Istio integration is enabled and set x-forward-host. I'm going to implement this as a plugin to allow us to add other transformations specific to the Mesh/Istio integration in one place later. Two questions I still need to figure out are:

  1. What if the user already supplied a hostRewrite? I assume we should use the user-supplied rewrite over one we generate
  2. Is servicename.namespace always a good host to use?

elcasteel avatar Aug 03 '22 18:08 elcasteel

@elcasteel what's the timeline for closing this out? End of this sprint? I don't actually know when the sprint ends though, tbh XD

bgottfried91 avatar Aug 05 '22 16:08 bgottfried91

Merging this fix will depend on our ability to bump the go-control-plane dependency in solo-projects. We don't have an estimate for that but I'll update when we discuss it.

elcasteel avatar Aug 05 '22 18:08 elcasteel

solo-projects dependency bump currently in progress.

elcasteel avatar Aug 26 '22 15:08 elcasteel

Closing as it is marked as done

chrisgaun avatar Sep 02 '22 13:09 chrisgaun