vertx-sql-client icon indicating copy to clipboard operation
vertx-sql-client copied to clipboard

Metrics vertx_pool_queue_pending doesn't decrease after connection loss

Open anvx opened this issue 10 months ago • 4 comments

Version

The latest released. At the moment of writing this - 4.5.13

Context

We have encountered a metrics issue while using the non-blocking Postgres DB driver with Vertx. The reason why two issues are mentioned in one ticket is that I believe they are tightly coupled and fixing one potentially fixes the second.

Issue 1: vertx_pool_queue_pending doesn't decrease after connection loss When we have pending queries (vertx_pool_queue_pending{pool_type="sql",}) and the database connection is lost (due to a DB restart, network glitch, etc.), the vertx_pool_queue_pending metric remains stuck. It never goes below the value recorded at the time of connection loss.

This means that in the metrics graph, it appears as if there are always pending queries waiting for a connection—even when the database connection is restored immediately. The only way to resolve this issue is to restart the service.

I've reviewed VertxPoolMetrics and related classes, but it's unclear where the issue lies. Notably, any queries that were pending when the connection was lost are never executed after reconnection.

Issue 2: vertx_pool_queue_pending freezes with high load We also observed that when sending a high volume of requests, the vertx_pool_queue_pending metric does not decrease correctly.

Do you have a reproducer?

import io.vertx.core.Vertx;
import io.vertx.core.VertxOptions;
import io.vertx.core.json.JsonObject;
import io.vertx.junit5.VertxExtension;
import io.vertx.junit5.VertxTestContext;
import io.vertx.micrometer.MetricsService;
import io.vertx.micrometer.MicrometerMetricsOptions;
import io.vertx.micrometer.VertxPrometheusOptions;
import io.vertx.pgclient.PgBuilder;
import io.vertx.pgclient.PgConnectOptions;
import io.vertx.sqlclient.Pool;
import io.vertx.sqlclient.PoolOptions;
import org.junit.jupiter.api.AfterAll;
import org.junit.jupiter.api.BeforeAll;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.extension.ExtendWith;
import org.testcontainers.containers.PostgreSQLContainer;

@ExtendWith(VertxExtension.class)
public class PostgresTest {

  private static PostgreSQLContainer<?> postgresContainer;
  private static Vertx vertx;
  private static Pool pgPool;

  @BeforeAll
  static void setup() {
    postgresContainer = new PostgreSQLContainer<>("postgres")
      .withDatabaseName("testdb")
      .withUsername("user")
      .withPassword("password");
    postgresContainer.start();

    MicrometerMetricsOptions metricsOptions = new MicrometerMetricsOptions()
      .setPrometheusOptions(new VertxPrometheusOptions()
        .setEnabled(true)
        .setStartEmbeddedServer(true)
        .setEmbeddedServerOptions(new io.vertx.core.http.HttpServerOptions().setPort(8081))
        .setPublishQuantiles(true))
      .setEnabled(true);

    vertx = Vertx.vertx(new VertxOptions().setMetricsOptions(metricsOptions));

    PgConnectOptions connectOptions = new PgConnectOptions()
      .setPort(postgresContainer.getFirstMappedPort())
      .setHost(postgresContainer.getHost())
      .setDatabase(postgresContainer.getDatabaseName())
      .setUser(postgresContainer.getUsername())
      .setPassword(postgresContainer.getPassword());

    PoolOptions poolOptions = new PoolOptions().setMaxSize(5);

    pgPool = PgBuilder.pool()
      .with(poolOptions)
      .connectingTo(connectOptions)
      .using(vertx)
      .build();
  }


  @Test
  void testDatabaseConnection(VertxTestContext testContext) throws InterruptedException {
    for (int i = 0; i < 300_000; i++) {
      pgPool.withTransaction(sqlConnection ->
              sqlConnection.query("SELECT PG_SLEEP(5)").execute()
      );
    }

    MetricsService metricsService = MetricsService.create(vertx);
    for (int i = 0; i < 1_000_000; i++) {
      Thread.sleep(1000);
      JsonObject metricsSnapshot = metricsService.getMetricsSnapshot();
      System.out.println("vertx.pool.in.use" + metricsSnapshot.getString("vertx.pool.in.use"));
      System.out.println("vertx.pool.queue.pending" + metricsSnapshot.getString("vertx.pool.queue.pending"));
      System.out.println("=======");
    }
  }

  @AfterAll
  static void tearDown() {
    if (pgPool != null) {
      pgPool.close();
    }
    if (vertx != null) {
      vertx.close();
    }
    if (postgresContainer != null) {
      postgresContainer.stop();
    }
  }

}

Steps to reproduce

Please run the test above and take a look at the logs

Observed Behavior

  • We create 300,000 requests, which immediately fill up vertx.pool.queue.pending (except for the 5 connections actively processing queries).
  • Once all requests are added to the queue, we start printing metrics every second.
  • After about a minute, vertx.pool.in.use drops to 0, meaning no queries are actively being processed.
  • However, vertx.pool.queue.pending freezes at around 299,970 and never decreases.
  • Any new requests increase the pending count from this frozen value, rather than resetting.

anvx avatar Mar 04 '25 12:03 anvx

We see this bug as well. In our case, it is caused by connection timeouts.

If a connection is requested from the pool but a timeout occurs, metrics.rejected() is never called. This causes the counter to not decrease.

Relevant part of the code: SqlConnectionPool, the onEnqueue method

https://github.com/eclipse-vertx/vertx-sql-client/blob/master/vertx-sql-client/src/main/java/io/vertx/sqlclient/impl/pool/SqlConnectionPool.java#L255

dennisschroer avatar Mar 31 '25 12:03 dennisschroer

Thanks to both of you for your feedback

tsegismont avatar Mar 31 '25 12:03 tsegismont

We experience the same for the vertx_pool_in_use metric, a decrease only happens if the connection was closed successfully.

cgm-aw avatar Jun 03 '25 06:06 cgm-aw

PR for the fix https://github.com/eclipse-vertx/vertx-sql-client/pull/1530 (waiting for Central issue to be solved and build passing before merging)

tsegismont avatar Jun 17 '25 09:06 tsegismont