trino-gateway icon indicating copy to clipboard operation
trino-gateway copied to clipboard

Add connection pooling to Trino Gateway backend datastore

Open rdsarvar opened this issue 1 year ago • 5 comments

Out-of-the-box JDBI does not seem to leverage connection pools and will create a new connection for every query against the backend datastore as seen in the public Slack channel here. During high traffic this can lead to reaching maximum connection counts against the backend datastore.

It has been noted that there are multiple ways of getting around the limitation of maximum number of connections:

  1. Optimize the underlying datastore configurations so that connections drop off quicker
  2. Use pgbouncer as a service in between Trino Gateway and PostgreSQL (assuming postgres is used as the backend store instead of MySQL or similar)
  3. Implement connection pooling as part of the Trino Gateway code itself. Example PostgresSQL connection pooling provided by JDBI docs: https://jdbi.org/#_high_availability

As an enhancement we are looking to minimize overall connections required by Trino Gateway against its backend datastore.

rdsarvar avatar Jun 03 '24 14:06 rdsarvar

Trino already has connection pooling in a number of places. Not sure if they involved JDBI though ... we definitely have some for JDBC connections though. We can probably take some inspiration from there. Or maybe there even is something in airlift (although I dont remember seeing something along that line).

mosabua avatar Jun 03 '24 15:06 mosabua

@mosabua It seems that for each query the gateway opens a new JDBC connection to store query metadata.
There is no connection pooling layer (e.g. HikariCP or shared DataSource), so connections are created per request.

During load testing, I observed around 600 concurrent connections from trino-gateway to PostgreSQL.
Typical statements executed are:

INSERT INTO query_history (...)
SELECT * FROM gateway_backend WHERE active = ? AND routing_group = ?
SELECT external_url FROM query_history WHERE query_id = ?

This looks like a missing connection pooling issue in JdbcConnectionManager, since Jdbi.create() is called per routing group / request without a shared pool.

arturmkr avatar Nov 03 '25 19:11 arturmkr

Thanks for investigating this more @arturmkr .. we definitely should add connection pooling. Will either @rdsarvar or @arturmkr work on a PR that implements that?

mosabua avatar Nov 04 '25 01:11 mosabua

@mosabua , I prepared the PR for this.

arturmkr avatar Nov 04 '25 19:11 arturmkr

Thanks @arturmkr, I was going to do this a while back but after having a baby I've been tight on spare time 😄

rdsarvar avatar Nov 04 '25 21:11 rdsarvar