Bridge-To-Kubernetes icon indicating copy to clipboard operation
Bridge-To-Kubernetes copied to clipboard

Open Service Mesh (OSM) support

Open s-bauer opened this issue 1 year ago • 0 comments

This is my evolving brain-dump on how to support Bridge to Kubernetes with Open Service Mesh (OSM). I assume it'll work similar with other service meshes.

Sidecar Injection

OSM automatically injects an envoy sidecar into all containers if the namespace is having OSM support enabled. The first issue with this is that both, BridgeToKubernetes and OSM, use the same name "envoy" for their containers. This obviously leads to conflicts with results in the pods not getting created. The fix for this is fairly easy: Just rename the container in BridgeToKubernetes from "envoy" to e.g. "bridge-envoy" in RoutingStateEstablisher.cs.

In addition, multiple envoys seem to be conflicting with each other. A solution is to add the --base-id parameter to the envoy invocation and pass a value different than "0". I assume, using --use-dynamic-base-id instead of a hardcoded base-id is a better idea, I'll need to test that. Anyway, the code change is simply adding e.g. --base-id 3 to the args in RoutingStateEstablisher.cs.

Envoy Routing

Next issue was the envoy routing configuration. Turns out, with OSM you'll need to send the correct "Host" header for request to make it to the destination service. Fixing this is again fairly easy:

  • Change the cluster type from static_dns to logical_dns
  • Add auto_host_rewrite: true to the route

An example might look like this:

Cluster:

clusters:
- name: service_original_clone_80_80
  connect_timeout: 1.00s
  type: logical_dns
  load_assignment:
    cluster_name: service_original_clone_80_80
    endpoints:
    - lb_endpoints:
      - endpoint:
          address:
            socket_address:
              address: api-engagement-cloned-routing-svc.platform-dev
              port_value: 80

Route:

- match:
    prefix: /
  route:
    cluster: service_original_clone_80_80
    timeout: 0s
    idle_timeout: 0s
    auto_host_rewrite: true

At this point, traffic that is routed to the original service works fine, all with injected OSM sidecars! One issue remains though.

Routing to Pod

When BridgeToKubernetes matches the given routing header, it tries to forward traffic directly to the routing pod (guess there's a better name for that. It's the one that runs the lpkremoteagent image). It does that by directly connecting to the pod's IP. This unfortunately doesn't work with OSM. Instead, a service needs to be created that exposes this service and the envoy config needs to be adjusted once more:

  • Change the cluster type from static to logical_dns
  • Add auto_host_rewrite: true to the route

These are the same steps taken above to route the traffic to the original service, only now for the routing pod/service.

Cluster:

clusters:
  - name: service_debug_withHeader_SomeHeader_SomeValue_80_80
    connect_timeout: 1.00s
    type: logical_dns
    load_assignment:
      cluster_name: service_debug_withHeader_SomeHeader_SomeValue_80_80
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: myusername-api-engagement-77665f9bbc-k9wzp.platform-dev
                port_value: 80

Route:

- match:
    headers:
    - name: SomeHeader
      exact_match: SomeValue
    prefix: /
  route:
    cluster: service_debug_withHeader_SomeHeader_SomeValue_80_80
    timeout: 0s
    idle_timeout: 0s
    auto_host_rewrite: true

It Works?

As far as I can tell, YES! Traffic to both the original service as well as the local machine works as expected!

Next Steps

  • [x] Proof of Concept
  • [ ] Get Feedback from BridgeToKubernetes contributor
  • [x] Adjust Code to
    • [x] Change name of pod from "envoy" to "bridge-envoy"
    • [x] Change Envoy configuration to set the Host header correctly
    • [x] Create a service for the lpkremoteagent pod
    • [x] Change Envoy configuration to route to service instead of Pod IP
  • [ ] Testing

I'd like to get feedback on my approach and potential implications from Microsoft. In my opinion, those changes will work in clusters with or without service mesh. If we're lucky, the change even works generically with all service meshes, not just Open Service Mesh (OSM). I currently don't see any reason why it wouldn't. This would greatly increase the quality of BridgeToKubernetes and allow it to be used in more clusters.

s-bauer avatar Mar 28 '23 20:03 s-bauer