smallrye-stork icon indicating copy to clipboard operation
smallrye-stork copied to clipboard

Integration with MP fault tolerance for load balancers

Open mswiderski opened this issue 3 years ago • 4 comments

Load balancers have important role to route traffic from client to service endpoints. These endpoints might be erroneous in a way that service discovery cannot handle. For example services themselves are not equipped with health checks so service providers cannot ensure their availability.

Utilising fault tolerance's circuit breaker can bring significant value to deal with failing server instances the can be made temporarily as failing.

At the same time load balancer can try different instances available before failing on the service call. This would make the client code less impacted as long as there are still some server instances that can handle the call.

@michalszynkiewicz

mswiderski avatar Nov 16 '21 12:11 mswiderski

A load balancer with circuit breaker should be relatively simple to implement and it's definitely worth it.

I'm not sure Stork is a good place to implement automatic retry on a different endpoint, we'll think about it.

michalszynkiewicz avatar Nov 17 '21 14:11 michalszynkiewicz

I'm not sure Stork is a good place to implement automatic retry on a different endpoint, we'll think about it.

isn't that what circuit breaker does it already?

mswiderski avatar Nov 17 '21 15:11 mswiderski

Usually circuit breaker is just about whether the communication with a remote endpoint should be allowed or not.

It keeps open, half-open or closed state for an endpoint (in MP FT for a method) and if the state is open, it will throw an error on an attempt to use it.

With Stork we can have this state per each ServiceInstance. Problem with retrying from the Stork level is that it's not Stork but the client libs that use it that actually do the request. We need to figure out how to make this work and if we really want Stork to be involved.

michalszynkiewicz avatar Nov 17 '21 17:11 michalszynkiewicz

But stork provides load balancer and I think such checks (if endpoint is good or bad) should be handled by load balancer. so it could be at the level of integration with the client that uses it so it can provide feedback that the service instance failed and thus load balancer should provide another service instance to try before failing. it should only attempt to fail on specific errors such as timeout/network problems as regular http response codes should be considered as successful calls I think.

So yes, requires some thinking on how to approach it :)

mswiderski avatar Nov 17 '21 18:11 mswiderski