gloo
gloo copied to clipboard
Istio Compat: Routes to Services In Workspace with Service Isolation Enabled Fail Without Workaround
Gloo Edge Version
1.12.x (beta)
Kubernetes Version
1.23.x
Describe the bug
#6195 and #6196 added support for full compatibility between a GE Gateway Proxy and services in the Mesh, but it turns out the support only works for cross-workspace traffic (e.g. from a gateways
workspace to the bookinfo
workspace) for workspaces that do not have Service Isolation
enabled. If you do enable Service Isolation
, any GE route that does not do a hostRewrite
* will fail when attempting to access a service in another workspace that has Service Isolation
enabled.
*You can work around this by setting a hostRewrite
of servicename.namespace
for each route. See additional context for why this isn't a good long-term solution.
Steps to reproduce the bug
- Deploy Gloo Mesh and register a cluster with it.
- Deploy Gloo Edge on the target cluster. Make sure to set
global.istioIntegration.enableIstioSidecarOnGateway
in the Helm values file and ensure that the Gloo Edge gateway proxy has a sidecar. - Deploy an application on the target cluster that Edge will target (e.g.
bookinfo
) - Create two workspaces on the registered cluster, one of which covers the Gloo Edge namespace and the other which covers the application namespace. Ensure that
Service Isolation
is enabled for the application workspace. - Create a Virtual Service for Gloo Edge that targets an application in the application namespace (e.g.
productpage
) - Try to reach the application through that VS on the Gloo Edge proxy URL. It should fail with an error
upstream connect error or disconnect/reset before headers. reset reason: connection termination
Expected Behavior
When creating a route for a service that is in the Mesh and Gloo Edge is configured for compatibility with Gloo Mesh, you should not have to do any special configuration for the routes exposed by Gloo Edge beyond pointing them to the service upstreams.
Additional Context
While there is a workaround that can be used to avoid this in the short-term, this is a gap in the Istio compatibility of Edge with Mesh. Considering that even in the short-term, we have clients that are trying to use this integration in order to have access to Portal and in the long-term we may have clients that do not want to leave Edge but do want to add Gloo Mesh, it's worth looking into what it would take for Edge to do this without the workaround. Additionally, spoke with Neeraj about this briefly and he pointed out that doing this hostRewrite
could cause issues for applications that are expecting the hostname sent to Edge from the external traffic. There are apparently ways to rewrite again in the receiving sidecar, but at that point we're getting into complex config to get around this. It seems like a better option to find some way to include the gateway host as a valid service target for any of the services in the mesh so that the appropriate target gets selected.
@kcbabo @chrisgaun Need to prioritize for a customer.
In order for services to be accessed across workspaces in GME with serviceIsolation enabled in the application workspace, you need to export the service from the application workspace and import it into the gateway's workspace. That should generate the necessary resources to allow this communication.
Separately but related - if you have serviceIsolation enabled on the gateway workspace, you'll need to create a GME AccessPolicy
to allow traffic to that gateway from non-mesh services.
Apologies, I didn't clarify that in the reproduction steps, but in the test we did we had set up dual way importing-exporting (i.e. importing to and exporting from for both workspaces) of everything in the workspaces and still had this issue. The issue we ran into took a lot of digging (and Eitan and Yuval's assistance to debug) and seemed to be related to the hostname that the request came in on. My understanding of the issue was that the request came in with the hostname of the Edge gateway (whatever hostname was used to hit it from outside the cluster) and was routed based on an Edge VS - the request gets grabbed by Edge's sidecar, but it doesn't have that hostname in its list of services in the mesh (since it registered Edge using the servicename.namespace
name internally) so it treats the request as going to a service outside the mesh and doesn't send it with mTLS. The request is then received by the server sidecar and since we have Service Isolation
enabled, requests without mTLS are rejected by the server sidecar. I might be misunderstanding parts of what's happening, but can definitely confirm that the issue occurs even when we have broad importing/exporting set up between the workspaces.
@bgottfried91 what was the host rewrite you used as a workaround?
Sorry for the delay! This is what my VS looked like with the workaround:
apiVersion: gateway.solo.io/v1
kind: VirtualService
metadata:
name: prodpage
namespace: gloo-system
spec:
virtualHost:
domains:
- '*'
routes:
- matchers:
- prefix: /
options:
hostRewrite: productpage.bookinfo
routeAction:
single:
upstream:
name: bookinfo-productpage-9080
namespace: gloo-system
Note the hostRewrite
to servicename.namespace
@elcasteel any luck reproducing the issue or the workaround? I can try and get an instance set up in AWS this afternoon for you to take a look at if not.
I reproduced the both the issue and the workaround.
To fix this, we're going to automatically rewrite the host header in Gloo when Istio integration is enabled and set x-forward-host
.
I'm going to implement this as a plugin to allow us to add other transformations specific to the Mesh/Istio integration in one place later.
Two questions I still need to figure out are:
- What if the user already supplied a hostRewrite? I assume we should use the user-supplied rewrite over one we generate
- Is
servicename.namespace
always a good host to use?
@elcasteel what's the timeline for closing this out? End of this sprint? I don't actually know when the sprint ends though, tbh XD
Merging this fix will depend on our ability to bump the go-control-plane dependency in solo-projects. We don't have an estimate for that but I'll update when we discuss it.
solo-projects dependency bump currently in progress.
Closing as it is marked as done