armeria
armeria copied to clipboard
Add `CircuitBreakerService`
Sometimes, if a service is unhealthy, users might want to automatically stop receiving new incoming requests from clients and open a circuit for the service URL. This could be useful to prevent an outage of a service from making a server unhealthy. Because many services could share common resources such as a blocking task executor. The outage could be propagated to the server in an instant.
So if we provide a CircuitBreakerService
, the CircuitBreakerService
could make a server more resilient.
- If a service returns a number of 5xx server errors than a threadhold,
a
CircuitBreakerService
opens a circuit and rejects new requests, and drops the connections. - A
CircuitBreakerService
MAY selectly opens a circuit for a certain remote peer such as IP address or domain name.- The IPs or domains may be abusers or trying DDOS attacks.
We could reuse a lot of code of CircuitBreaker
module on the client-side.
Just a though - We may want to take advantage of this chance to see if we can make use of resilience4j.
this chance to see if we can make use of resilience4j.
That is a nice idea.
Hi @ikhoon 👋,
If this issue is still valid. I'd like to work on this. Could you please assign it to me? Thank you.
For sure! Thanks for your attention to this.
Are we thinking about an implementation similar to ThrottlingService
that also comes with a decorator?
https://armeria.dev/docs/advanced-production-checklist/
We already have circuit breakers recommended in the production checklist for clients. I think it is a great idea to add circuit breaking on server side. I really liked the idea because of the reasons you provide and based on our post-mortem.
Let's say one of our endpoints started taking too long time to process requests and everything suddenly started timing out. There can be several reasons:
- An external API or DB is down.
- Suddenly clients decided to send a lot more requests than usual.
- Suddenly clients decided to send much much bigger requests than usual.
- A buggy deployment of application causing that specific endpoint to timeout.
If things are running in event loop and let's say waiting tasks are implemented as blocking operations, ideally there should be no blocking operations but we don't want to take the whole app down just because an endpoint has buggy implementation. Circuit breaker should fail fast after that specific endpoint times out.
If there are no blocking operations in event loop, still we will spend lots of CPU cycles in high throughput, huge payload or buggy deployment scenarios. Circuit breaker should pull the rest of the endpoints into a somewhat healthy state.
If we have blockingTaskExecutor
enabled, having concurrent calls is not a huge issue but the thread pool can be depleted (Not sure what is the default thread pool size) meaning other tasks to not have enough threads to operate safely. Even in this scenario, server should prevent itself by short circuiting those calls to that endpoint. This also applies to very long running operations that deplete CPU and make other endpoints slow too.
I believe we wouldn't see this behavior if we had circuit breaking service in our app too: https://discord.com/channels/1087271586832318494/1087272728177942629/1131268159907844307
Anyway, long story short, I am interested in implementing this. I can provide a sample PR soon.
I think we should be careful about decorator orders while implementing this too. Ideally circuit breaker should be the outermost decorator so we are not having performance issues caused by other decorators, i.e. logging decorator.
We can re-use the code from exiting CircuitBreaker
but it is bound to a client request context. Should we move away from the client oriented circuit breaker and use a circuit breaker that takes a request context?
It can induce some breaking changes as we move circuit breaker away from client
package to common
and introduce two kinds of circuit breakers such as ServerCircuitBreaker
and ClientCircuitBreaker
maybe using the syntax CircuitBreaker.ofClient()
etc.
Please let me know if it is the right direction and we should lean towards that.
We can re-use the code from exiting CircuitBreaker but it is bound to a client request context. Should we move away from the client oriented circuit breaker and use a circuit breaker that takes a request context?
Good question. IIRC, CircuitBreaker
does not depend on ClientRequestContext
but some classes in circuitbreaker
are bound to ClientRequestContext
. We designed CircuitBreaker
as a general utility so that users can use it in any situation apart from CircuitBreakerClient
.
It can induce some breaking changes as we move circuit breaker away from client package to common and introduce two kinds of circuit breakers such as ServerCircuitBreaker and ClientCircuitBreaker maybe using the syntax CircuitBreaker.ofClient() etc.
We don't make breaking changes for API that is not marked as @UnstableApi
. We can copy CircuitBreaker
to common
and make the original CircuitBreaker
extends the one in common
.
package com.linecorp.armeria.common.circuitbreaker;
public interface CircuitBreaker {
...
}
package com.linecorp.armeria.client.circuitbreaker;
@Deprecate
public interface CircuitBreaker extends com.linecorp.armeria.common.circuitbreaker.CircuitBreaker {
@Override
...
}
We also need to deprecate some interfaces or classes that are also needed on the server-side.
If some of them depend on ClientRequestContext
, we need to change it to RequestContext
.
Good idea, I did not think about using the same name in common
packege. I will get into it ASAP 🙂