micronaut-core icon indicating copy to clipboard operation
micronaut-core copied to clipboard

@CircuitBreaker cache failed result when gate open

Open o-shevchenko opened this issue 2 years ago • 22 comments

Expected Behavior

The first request fails but second pass

Actual Behaviour

The first request fails but the second returns the same error as the first request.

Steps To Reproduce

Similar to the issue described here: https://github.com/micronaut-projects/micronaut-rxjava3/issues/87. Not sure if it's only related to reactive libs. Let me know if such bug should be open in core instead.

Hi I faced the same problem for CircuitBreaker when I use Reactor Publisher. I have two tests. First, pass the wrong parameters and I get io.micronaut.http.client.exceptions.HttpClientResponseException BadRequest and this is expected, but the second test pass the correct parameters and should pass but it fails with the same error and parameters passed in the first tests. When I change the order of tests everything works fine. When I remove CircuitBreaker tests work fine as well. Currently, I disabled CircuitBreaker by using excludes = [HttpClientResponseException::class] since I don't need to do retries if parameters are incorrect. But looks like CircuitBreaker has some bug and it caches results or keeps the gate open and returns the previous request to another consumer which is really bad since it is hard to debug and moreover it might be a security violation.

@Client(
    value = "\${url}",
    path = "\${path}",
    errorType = ErrorInfo::class
)
@CircuitBreaker(
    attempts = "\${retry.attempts}",
    delay = "\${retry.delay}",
    multiplier = "\${retry.multiplier}",
    reset = "\${retry.reset}"
)
@Headers(Header(name = ACCEPT, value = "application/json"))
interface MyClient {

    @Post("/test")
    fun test(@Body dto: Dto): Publisher<Any>
}
....
@Post(uri = "/test")
    fun test(
        @Body dto: dto
    ): Publisher<MutableHttpResponse<Any>> {
        return service.test(dto)
            .toMono()
            .map { HttpResponse.ok(it) }
            .defaultIfEmpty(HttpResponse.ok())
            .onErrorResume(TestException::class.java) { exception ->
                Mono.just(HttpResponse.badRequest(JsonError(exception.message)))
            }
    }
  1. Create a reactive Controller and Client
  2. Make a request using a client which will fail with HttpClientResponseException. Example: pass some parameters, so your controller returns BadRequest (HttpClientResponseException will be thrown by DefaultHttpClient automatically)
  3. Make a second request with the correct parameters, so the controller should return HttpStatus.OK. BUT it will return the same error as for the first request even we already changed parameters.

Environment Information

No response

Example Application

No response

Version

3.5.0

o-shevchenko avatar Jun 10 '22 09:06 o-shevchenko

@jameskleeh Could you please take a look?

o-shevchenko avatar Jun 10 '22 09:06 o-shevchenko

Can you please create a sample project that reproduces it

dstepanov avatar Jun 10 '22 09:06 dstepanov

@dstepanov Please, just run DemoTest https://github.com/o-shevchenko/micronaut_reactor_circuitbreaker_bug image image

If you run testPass first both test will pass: image

o-shevchenko avatar Jun 10 '22 10:06 o-shevchenko

If I remove CircuitBreaker or add excludes = [HttpClientResponseException::class] tests work

@CircuitBreaker(
    attempts = "\${controller.retry.attempts}",
    delay = "\${controller.retry.delay}",
    multiplier = "\${controller.retry.multiplier}",
    reset = "\${controller.retry.reset}",
    excludes = [HttpClientResponseException::class]
)

o-shevchenko avatar Jun 10 '22 10:06 o-shevchenko

Added body to response to show that data from first request was returned to second image

o-shevchenko avatar Jun 10 '22 11:06 o-shevchenko

@dstepanov Are you able to reproduce this issue using tests?

o-shevchenko avatar Jun 14 '22 07:06 o-shevchenko

Yes, I can reproduce it. It looks like CircuitBreakerRetry is returning the last exception.

dstepanov avatar Jun 14 '22 09:06 dstepanov

yes, it returns the last exception (with the last body) even if we already send new data via the client. so, CircuitBreaker returns data that belongs from one caller to another. That was hard to realize

o-shevchenko avatar Jun 14 '22 09:06 o-shevchenko

Probably, the gate should be closed when we reach out from another caller but it still open

o-shevchenko avatar Jun 14 '22 09:06 o-shevchenko

What do you mean by another caller?

dstepanov avatar Jun 14 '22 10:06 dstepanov

I mean that looks very dangerous when I can call a client from one test method with different parameters but get results from the previous call even from another test. I think about the situation when we have a production system and implemented REST client to talk between microservices that will be used in a multi-tenant environment. I'm wondering if some users can get receive an error that was caused by a call from another user. Anyway, do you know where the issue occurs? Do you have any workaround in mind before fix?

o-shevchenko avatar Jun 14 '22 10:06 o-shevchenko

Yes, I think it should be changed to throw a basic exception, maybe add an option for @CircuitBreaker. The code in CircuitBreakerRetry.java needs to be changed, maybe you can come up with a PR?

dstepanov avatar Jun 14 '22 10:06 dstepanov

I think it should be changed to throw a basic exception

I'm not sure why it should throw an exception for the second call at all. Both tests should pass and the state should not be shared between them. Probably, I have to remove CircuitBreaker and use just Retriable to don't operate with gate state.

o-shevchenko avatar Jun 14 '22 10:06 o-shevchenko

The idea behind the circuit breaker is to stop making requests when there are multiple errors. So you "closed" the resource and the next request automatically gets the same error for the duration of the reset value.

dstepanov avatar Jun 14 '22 11:06 dstepanov

Right, it makes sense when service is not available. But currently, it does retries and closes gate even when we just return our own exception or just 400 HTTP status. It looks not right, when some clients needs to receive 400 status gate is closed for all other clients. That's not the case of CircuitBreaker usage. I added excludes = [HttpClientResponseException::class] to avoid CircuitBreaker retries for such cases, but not sure if it's correct and if we can get the same situation with other types of exception or vice versa skip CircuitBreaker when service is really unavailable.

o-shevchenko avatar Jun 14 '22 12:06 o-shevchenko

Return 400 works fine for not-reactive clients. But there is a problem with reactive clients because DefaultHttpClient always throws HttpClientResponseException, so CircuitBreaker is starting retries and closing the gate for other requests. This issue is related to the topic I was asking about here: https://github.com/micronaut-projects/micronaut-core/discussions/7558

o-shevchenko avatar Jun 14 '22 13:06 o-shevchenko

I see, maybe you can add RetryPredicate in @Retryable for proper understanding of the exception. I think we would need to add something to improve the current behavior.

dstepanov avatar Jun 14 '22 13:06 dstepanov

@dstepanov Any updates on that? Retries for HttpClientResponseException and CircuitBreaker make reactive client unusable. Migration to reactive Mongo Data and Reactor is very hard

o-shevchenko avatar Jun 29 '22 18:06 o-shevchenko

Do you have any plans to include this improvement in future releases? We got rid of CircuitBreaker to continue using the reactive Mongo Data client and Reactor, but we'd like to bring it back over time. Thanks!

o-shevchenko avatar Dec 27 '22 08:12 o-shevchenko

We can change the existing behavior in v4. Can you please create a PR with a test of your scenario?

I think replaying the last exception is not the best idea by default and should be optional in v4.

dstepanov avatar Dec 27 '22 12:12 dstepanov

Returning the response body from a previous request to another request is definitely not desirable behavior. But I think CircuitBreaker should be improved not to close the gate for some error codes. For example, a controller throws 400 error for some payload (e.g. validation failed), and as a result, retries started (but they shouldn't in case we have handled an exception and throw own with the needed error code) and the gate will be closed, and a subsequent request even with valid data will be declined. That's why we were disabling retries for HttpClientResponseException. The interesting part is that we see this problem only with the reactive client (See GitHub repo that I created and attached with tests to reproduce the issue), not-reactive client seems to work fine.

o-shevchenko avatar Dec 27 '22 12:12 o-shevchenko

Any updates here?

o-shevchenko avatar Mar 07 '24 17:03 o-shevchenko