armeria Add `CircuitBreakerService`

Sometimes, if a service is unhealthy, users might want to automatically stop receiving new incoming requests from clients and open a circuit for the service URL. This could be useful to prevent an outage of a service from making a server unhealthy. Because many services could share common resources such as a blocking task executor. The outage could be propagated to the server in an instant.

So if we provide a CircuitBreakerService, the CircuitBreakerService could make a server more resilient.

If a service returns a number of 5xx server errors than a threadhold, a CircuitBreakerService opens a circuit and rejects new requests, and drops the connections.
A CircuitBreakerService MAY selectly opens a circuit for a certain remote peer such as IP address or domain name.
- The IPs or domains may be abusers or trying DDOS attacks.

We could reuse a lot of code of CircuitBreaker module on the client-side.

Jul 24 '21 07:07 ikhoon

Just a though - We may want to take advantage of this chance to see if we can make use of resilience4j.

Jul 26 '21 09:07 trustin

this chance to see if we can make use of resilience4j.

That is a nice idea.

Jul 27 '21 04:07 ikhoon

Hi @ikhoon 👋,

If this issue is still valid. I'd like to work on this. Could you please assign it to me? Thank you.

Jul 15 '22 12:07 klurpicolo

For sure! Thanks for your attention to this.

Jul 17 '22 12:07 ikhoon

Are we thinking about an implementation similar to ThrottlingService that also comes with a decorator?

https://armeria.dev/docs/advanced-production-checklist/

We already have circuit breakers recommended in the production checklist for clients. I think it is a great idea to add circuit breaking on server side. I really liked the idea because of the reasons you provide and based on our post-mortem.

Let's say one of our endpoints started taking too long time to process requests and everything suddenly started timing out. There can be several reasons:

An external API or DB is down.
Suddenly clients decided to send a lot more requests than usual.
Suddenly clients decided to send much much bigger requests than usual.
A buggy deployment of application causing that specific endpoint to timeout.

If things are running in event loop and let's say waiting tasks are implemented as blocking operations, ideally there should be no blocking operations but we don't want to take the whole app down just because an endpoint has buggy implementation. Circuit breaker should fail fast after that specific endpoint times out.

If there are no blocking operations in event loop, still we will spend lots of CPU cycles in high throughput, huge payload or buggy deployment scenarios. Circuit breaker should pull the rest of the endpoints into a somewhat healthy state.

If we have blockingTaskExecutor enabled, having concurrent calls is not a huge issue but the thread pool can be depleted (Not sure what is the default thread pool size) meaning other tasks to not have enough threads to operate safely. Even in this scenario, server should prevent itself by short circuiting those calls to that endpoint. This also applies to very long running operations that deplete CPU and make other endpoints slow too.

I believe we wouldn't see this behavior if we had circuit breaking service in our app too: https://discord.com/channels/1087271586832318494/1087272728177942629/1131268159907844307

Anyway, long story short, I am interested in implementing this. I can provide a sample PR soon.

I think we should be careful about decorator orders while implementing this too. Ideally circuit breaker should be the outermost decorator so we are not having performance issues caused by other decorators, i.e. logging decorator.

Jul 22 '23 17:07 Dogacel

We can re-use the code from exiting CircuitBreaker but it is bound to a client request context. Should we move away from the client oriented circuit breaker and use a circuit breaker that takes a request context?

It can induce some breaking changes as we move circuit breaker away from client package to common and introduce two kinds of circuit breakers such as ServerCircuitBreaker and ClientCircuitBreaker maybe using the syntax CircuitBreaker.ofClient() etc.

Please let me know if it is the right direction and we should lean towards that.

Jul 26 '23 07:07 Dogacel

We can re-use the code from exiting CircuitBreaker but it is bound to a client request context. Should we move away from the client oriented circuit breaker and use a circuit breaker that takes a request context?

Good question. IIRC, CircuitBreaker does not depend on ClientRequestContext but some classes in circuitbreaker are bound to ClientRequestContext. We designed CircuitBreaker as a general utility so that users can use it in any situation apart from CircuitBreakerClient.

It can induce some breaking changes as we move circuit breaker away from client package to common and introduce two kinds of circuit breakers such as ServerCircuitBreaker and ClientCircuitBreaker maybe using the syntax CircuitBreaker.ofClient() etc.

We don't make breaking changes for API that is not marked as @UnstableApi. We can copy CircuitBreaker to common and make the original CircuitBreaker extends the one in common.

package com.linecorp.armeria.common.circuitbreaker;
public interface CircuitBreaker {
   ...
}

package com.linecorp.armeria.client.circuitbreaker;
@Deprecate
public interface CircuitBreaker extends com.linecorp.armeria.common.circuitbreaker.CircuitBreaker {
   @Override
   ...
}

We also need to deprecate some interfaces or classes that are also needed on the server-side. If some of them depend on ClientRequestContext, we need to change it to RequestContext.

Jul 26 '23 08:07 ikhoon

Good idea, I did not think about using the same name in common packege. I will get into it ASAP 🙂

Jul 26 '23 11:07 Dogacel

armeria armeria copied to clipboard

Add `CircuitBreakerService`

armeria
armeria copied to clipboard