traefik-on-service-fabric icon indicating copy to clipboard operation
traefik-on-service-fabric copied to clipboard

Support using Traefik as a Service Mesh with Service Fabric

Open lawrencegripper opened this issue 6 years ago • 2 comments

The aim would be to allow services to use a label like traefik.servicefabric.enable-mesh which would publish a service on an internal endpoint.

Inter-service communications can then benefit from features of traefik such as circuit-breakers, retry, rate-limiting etc.

Tasks:

  • update to add support for this label
  • add support for adaptive weighting to prefer routing to services on the local node over remove nodes
  • test using this approach in a large cluster

lawrencegripper avatar Apr 10 '18 14:04 lawrencegripper

Proposal

Create an additional label traefik.servicefabric.mesh. When this label is added to a service it would be added to the mesh endpoint defined in your traefik.toml the default template would be updated to include this endpoint too.

All labels set on the service would then control the behavior of the service in the mesh so existing labels for circuitbreaker etc would work internally.

As a stretch we'd look to add an additional label of traefik.servicefabric.preferlocal which would preference routing to a local service so, when used with mesh, you wouldn't go off node unless necessary.

# Entrypoints definition
#
# Optional
# Default:
[entryPoints]
[entryPoints.http]
address = ":80"
[entryPoints.traefik]
address = ":8080"
[entryPoints.mesh]
address = ":7887"

@AviMualem I know we chatted about this a bit earlier - does this sound like a good plan for you?

lawrencegripper avatar Apr 20 '18 16:04 lawrencegripper

Hey @lawrencegripper , First of all let me start by saying i truly believe this can be an amazing functionality to all service fabric users.

Just to add some background, before i started to think about traefik as a service mesh i started to think about the concept of using Traefik as in internal reverse proxy for my internal service to service communication after i noticed that when using windows as the OS of the cluster nodes the integrated reverse proxy is a thin and basic application which is far from being a fine grained reverse proxy which looks legit for production use cases.

On Linux based node deployment, the integrated reverse proxy is Envoy (which i find highly similar to Traefik) which is way different than the windows based reverse proxy app. I know from various sources that a work is being made in order to include Envoy is windows based deployment as well. As a personal note, Upon availability of both Traefik and Envoy i will probably choose Traefik for various reasons :).

After i examined all of the features offered from Traefik which include circuit breaker, rate limit, max connections, authorization,letsencrypt support, access log, retry policy and more..
I realized its way more than a simple proxy that make a URL manipulation, and it has a lot functionalities i want in my service mesh layer.

Before going to implementation and thinking about Lables and entry points I would start with checking with the Treaefik team what is their opinion regarding using Traefik as a service mesh, because after reading their documentation and large portion of the code it looks like they are defining Traefik more as a modern reverse proxy which handles communication from the external world into the backend implementation.

I cant see any samples or blogs talking about using it for internal service to service communication although its really easy to achieve it due to the fact we already have service discovery so whats left is just to have an entry point with a port which is not exposed to the external world.

In the rules identified associated with the internal endpoint i can easily get rate limiting, circuit breaker, retries and more for my internal service to service communication... which are without a doubt a mesh layer responsibility.

I even took the time to check it in my DEV cluster with 50+ micro services deployed to it and it looked fine. 40 of them was connected to the internal entry point and 10 were exposed to the external world in and external entry point which i exposed to the external world.

Now when it comes to mesh solutions there are a lot of discussions around the division between the data planes and the control planes (more info can be shown here -

  • https://medium.com/microservices-learning/understanding-microservices-communication-and-service-mesh-e888d1adc41

  • https://blog.envoyproxy.io/service-mesh-data-plane-vs-control-plane-2774e720f7fc

Treafik is often correlated with the data planes features...in my opinion some of the control planes features too.

Projects like Istio (https://istio.io/) have a a real separation between the control and the data plane. Essentially Istio makes use in Envoy as a data plane and have implementation for the control plane.

As far as it looks to me, it looks legit to use it as mesh because as i mentioned it looks like it includes a lot of out of the box functionality besides routing and classic load balancing, and on top of that at the end of the day Service fabric linux based deployment are are using Envoy in the same manner.

In my opinion it might not have the flexibility of stuff like Istio but its can be still used as a mesh layer.

Now, regarding the fact you consider to have an option that will prefer stay on the same node in service to service communication (if both service are deployed in the same host) im not sure it the exact behavior we want to achieve. if the node is under a lot of load maybe its better to route the traffic to another node which host the desired service.

I would be happy to see your opinion and @jjcollinge opinion as well :) this development should be really precise because if customers will use it as a mesh changing the design will be a hard task :)

Avi.

AviMualem avatar Apr 25 '18 15:04 AviMualem