vertx-circuit-breaker icon indicating copy to clipboard operation
vertx-circuit-breaker copied to clipboard

Higher memory usage, blocked thread when using circuit breaker

Open jyotipatel opened this issue 2 years ago • 13 comments

Version

io.vertx vertx-circuit-breaker 4.3.1

Context

I am getting high memory usage with the circuit breaker. I am just using httpbin.org to get all success responses. For individual requests, it works fine. While running a load test the JVM old gen utilization is spiking up.

Reproducer

I have the main verticle code, pasting it here itself:

` public class CleanServer extends AbstractVerticle { Logger logger = Logger.getLogger(CleanServer.class.getName());

  @Override
  public void start(Promise<Void> startPromise) throws Exception {
      Router router = Router.router(vertx);
      CircuitBreakerCache cbc = new CircuitBreakerCache(vertx);
      router.route(HttpMethod.GET, "/get").handler(context -> {
          List<String> domains = context.queryParam("user");
          String domain = domains.get(0);
          CircuitBreaker cb = cbc.getCircuitBreaker(domain + context.request().path());
          HttpServerResponse serverResponse =
                  context.response().setChunked(true);
          cb.executeWithFallback(promise -> {
              WebClientOptions options = new WebClientOptions().setTryUseCompression(true).setTcpNoDelay(true).setTcpCork(true).setReceiveBufferSize(128).setConnectTimeout(400);
              WebClient client = WebClient.create(vertx, options);
              client.get(80, "httpbin.org", "/status/200")
                      .timeout(2000)
                      .send(ar -> {
                          if (ar.succeeded()) {
                              HttpResponse<Buffer> response = ar.result();
                              int statusCode = response.statusCode();
                              if (statusCode != 200) {
                                  promise.fail(response.statusMessage());
                              } else {
                                  serverResponse.end("Hello!!");
                                  promise.complete();
                              }
                          } else {
                              promise.fail(ar.cause().getMessage());
                          }
                      });
          }, v -> {
              // Executed when the circuit is opened
              logger.log(Level.INFO, domain + " Failed " + cb.state().toString() + " Error: Circuit open");
              serverResponse.setStatusCode(200).setStatusMessage("Circuit Open").end("Circuit Open");
              return context;
          });
      });

      // Create the HTTP server
      vertx.createHttpServer(new HttpServerOptions().setMaxInitialLineLength(10000))
              // Handle every request using the router
              .requestHandler(router)
              // Start listening
              .listen(8080)
              // Print the port
              .onSuccess(server ->
                      System.out.println(
                              "HTTP server started on port " + server.actualPort()
                      )
              );


  }

} `

Circuit breaker options: CircuitBreakerOptions() .setMaxFailures(50) .setTimeout(5000) .setFallbackOnFailure(true) .setResetTimeout(10000)));

Steps to reproduce

  1. API used: http://localhost:8080/get?user=abc
  2. When I am hitting the above API at 50 QPS for 30 minutes. The java heap is getting filled up.

Extra

<vertx.version>4.2.6</vertx.version>

JVM params used: -XX:+UseG1GC -Xms4g -Xmx4g -XX:InitiatingHeapOccupancyPercent=70 -XX:MaxGCPauseMillis=200 -XX:ParallelGCThreads=20 -XX:ConcGCThreads=5

JVM memory with the load test. JVM

Error: WARNING: Thread Thread[vert.x-eventloop-thread-3,5,main] has been blocked for 3050 ms, time limit is 2000 ms

I think I am blocking the thread somewhere but not sure where exactly as the code seems pretty simple as given in the documentation.

jyotipatel avatar Jun 20 '22 13:06 jyotipatel

I've seen this issue as well, at ~ 4 PM, I deployed a new version of our application without the circuit breaker and the memory consumption wasn't ever-growing anymore as before.

image-2022-06-21-08-52-55-458

pierDipi avatar Jun 21 '22 08:06 pierDipi

An heap dump showed that there were many byte[] referenced by VertxUnsafeHeapByteBuf, that reference a HttpResponseImpl which is what our application returns as result of the VertxCircuitBreaker.execute() method (it's a Future<HttpResponse<Buffer>>:

image

pierDipi avatar Jun 21 '22 08:06 pierDipi

Thanks, I'll take a look later this week

tsegismont avatar Jun 21 '22 08:06 tsegismont

Do you have any update on this? @tsegismont

jyotipatel avatar Jun 29 '22 07:06 jyotipatel

@jyotipatel sorry not yet. I should be able to look into it next week

tsegismont avatar Jun 29 '22 09:06 tsegismont

In the mean time, is there a way to disable metrics altogether? As in the heap dump, I see histogram is taking the highest amount of memory image

jyotipatel avatar Jun 29 '22 14:06 jyotipatel

No. I believe we should disable the CircuitBreakerMetrics when eventbus notifications are disabled.

tsegismont avatar Jun 29 '22 14:06 tsegismont

I tried reducing the DEFAULT_METRICS_ROLLING_BUCKETS and the full GC trigger is delayed with it. As the old generation memory heap is still increasing but with slower rate this time.

So, the issue seems to be with the CircuitBreakerMetrics , more specifically HdrHistogram. Thought this might help you in the fix @tsegismont .

jyotipatel avatar Jun 30 '22 08:06 jyotipatel

Thanks for sharing your findings

tsegismont avatar Jun 30 '22 08:06 tsegismont

@jyotipatel I wasn't able to reproduce using 4.2.6 or 4.3.1. Perhaps you can share with us a full reproducer as a GH project? Ideally it would have runnable server code and load testing instructions.

I have a couple remarks regarding the snippet.

Why do you create one WebClient per remote call? Usually a single web client per verticle is often enough. Also, it looks unusual to create a circuit breaker per request path. Why do you need so many of them

I've sent a PR to disable metrics computation anyway: https://github.com/vert-x3/vertx-circuit-breaker/pull/61

tsegismont avatar Aug 10 '22 13:08 tsegismont

@pierDipi your memory problem looks different, can you share with us a small reproducer or at least the heap dump file?

tsegismont avatar Aug 10 '22 13:08 tsegismont

@tsegismont Thanks for looking into it. I will add the reproducer by end of the next week. I will also try to reduce the client creation. Our test case is to break the circuit for a particular user in case the failure is for that user only. Thus we need to have a circuit breaker per path and user_id combination. user_id will be a URL parameter in our case.

jyotipatel avatar Aug 18 '22 05:08 jyotipatel

@tsegismont I have shared the reproducer code in github with you as a collaborator: https://github.com/jyotipatel/vertx-circuit-breaker-httpbin. Please take a look.

jyotipatel avatar Aug 26 '22 14:08 jyotipatel

@jyotipatel can you please send the invitation again, it has expired. Thank you

tsegismont avatar Sep 19 '22 08:09 tsegismont

Sent again @tsegismont

jyotipatel avatar Sep 20 '22 13:09 jyotipatel

Thank you @jyotipatel

tsegismont avatar Sep 20 '22 16:09 tsegismont

@jyotipatel I have been able to reproduce and found the issue is in the CircuitBreakerCache implementation. It creates a CircuitBreaker any time the hash map getOrDefault method is invoked. But then this instance is never collected (even if the key is already in the map) because the circuit breaker schedules a periodic timer that keeps a reference to it.

In order to keep a reasonable number of circuit breaker implementations in your app, change your implementation to:

    public CircuitBreaker getCircuitBreaker(String key) {
        CircuitBreaker cb = circuitBreakers.get(key);
        if (cb == null) {
            cb = CircuitBreaker.create(key, vertx,
                    new CircuitBreakerOptions()
                            .setMaxFailures(50)
                            .setTimeout(5000)
                            .setFallbackOnFailure(true)
                            .setResetTimeout(10000));
            circuitBreakers.put(key, cb);
        }
        return cb;
    }

And I would recommend again to avoid creating a WebClient for each incoming request.

tsegismont avatar Sep 27 '22 21:09 tsegismont