quarkus Connection pool exhaustion for lazy entity association (acquisition timeout)

Describe the bug

Hi guys! I've noticed that some async processing in our project exhausts a database connection pool, and it never recovers. The problem can be fixed by adding @Transactional annotation to @Incoming event handler, but

I am not changing anything in the database during this process, using only "findById" method and getting some info from an entity
for some methods, performance is the case, so I would rather not add transactional processing there

Please check the reproducer. There are two related entities and two tests:

GreetingResourceTest: we are calling the HTTP endpoint repeatedly 50 times, and everything is ok. The connection pool works fine and gives us all entities to process
GreetingKafkaTest: we are sending an event to the topic and waiting for it to be processed. After the 4th repeat, it starts to fail because the connection pool max-size is set to 5

Please notice that when you set EAGER association between two entities, everything works fine for both tests. I believe something is wrong with releasing DB connection during async execution.

Expected behavior

LAZY association between entities works the same as EAGER one for non-transactional processing: connections are reused in the pool

Actual behavior

"Unable to acquire JDBC Connection" after some calls (max pool size) Caused by: java.sql.SQLException: Sorry, acquisition timeout!

How to Reproduce?

cp-reproducer.zip

Output of `uname -a` or `ver`

No response

Output of `java -version`

java version "11.0.12" 2021-07-20 LTS Java(TM) SE Runtime Environment 18.9 (build 11.0.12+8-LTS-237) Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.12+8-LTS-237, mixed mode)

GraalVM version (if different from Java)

No response

Quarkus version or git rev

2.11.2 and smaller

Build tool (ie. output of `mvnw --version` or `gradlew --version`)

No response

Additional information

No response

Aug 06 '22 16:08 OleKsimov

/cc @DavideD, @Sanne, @gavinking, @gsmet, @mswatosh, @yrodiere

Aug 08 '22 13:08 quarkus-bot[bot]

We'll need to check why connections aren't released w/o a Transaction - I suspect some scoping issue.

But FYI the stated reason to avoid a transaction for performance is misleading, I highly doubt that you can see any difference - if you can measure some difference I'd be interested :)

Also in Quarkus Hibernate ORM is configured to throw an exception if it's detecting that a transaction isn't active; there's multiple good reasons for this, including the fact that working in "scope-less" mode makes it tricky to be safe with resource handling - like you're noticing. The ability of Hibernate ORM to detect if transactions are missing is limited though - it seems you found one of the ways to bypass the check - but we might want to improve on that, so I'd suggest don't do this.

Beyond the issues you already noticed, you should also think of what it implies to re-associate a partial object in a different transaction; there's so many things that could go wrong - including the entity might no longer exist, or the IDs being reused by a different object. I would highly recommend against such a pattern.

Aug 09 '22 21:08 Sanne

@Sanne , I agree with you that there is a risk of working with entities without transactional scope. But I firmly believe it would be much better if developers could decide if they are ok taking that risk.

stated reason to avoid a transaction for performance is misleading, I highly doubt that you can see any difference

Let's imagine we have a long-running task (method), and some information should be extracted from the DB to use without modification. If such processing takes time to complete (sends 3-4 http requests to other services and waits for response), with @Transactional annotation your connection is still active all this time. And if the connection pool is 20 connections by default, you can't serve more than 20 RPS considering processing takes ~1 second.

Here is a small test to understand what is wrong with transactional processing for simple select operations:

    @Test
    public void test() throws InterruptedException {
        AtomicInteger atomicId = new AtomicInteger(0);
        for (int i = 0; i < 10; i++) {
            CompletableFuture.runAsync(() -> {
                try {
                    useTransaction(atomicId.incrementAndGet());
                } catch (InterruptedException e) {
                    throw new RuntimeException(e);
                }
            });
        }
        System.out.println("Waiting for results...");
        Thread.sleep(12000);
    }

    @Transactional
    public void useTransaction(long id) throws InterruptedException {
        Entity e = someRepo.findById(id);
        Thread.sleep(3000); // long task behavior emulation
        System.out.println(e);
    }

Just set datasource.jdbc.max-size=5 and populate your table with 10 records. And you will see 'Sorry, acquisition timeout' exception. And in real code, it could be a bit more complicated:

Entity1 e1 = repo1.find();
// some processing
Entity2 e2 = repo2.find();
// some processing
sendEvent(buildEvent(e1,e2));

So what @Transactional annotation adds for such methods? For me, it looks like it brings nothing but keeping connection open. Especially if database is read-heavy.

Aug 09 '22 23:08 OleKsimov

Ok that's a good point @OleKsimov - thanks for clarifying your issue.

However what you are describing is a starvation of the pool, which is not necessarily related to using a @Transactional block.

It's commonly assumed that a transaction needs to hold a connection; this is not true conceptually though: Hibernate ORM is able to return the connections to the pool so that other transactions can make progress w/o this necessarily implying that the current transaction needs to be terminated.

In Quarkus however, the way I configured things is that the transaction does indeed hold the connection as that's typically what matches user expectations and it's a bit more efficient - otherwise the ORM core engine needs to acquire & return connections for each single operation, which has a little bit of overhead.

So you're right in your observation, but rather than avoiding the transaction I would suggest a better solution for your use case is to configure

hibernate.connection.handling_mode=DELAYED_ACQUISITION_AND_RELEASE_AFTER_STATEMENT

See also: https://docs.jboss.org/hibernate/orm/5.6/userguide/html_single/Hibernate_User_Guide.html#database-connection-handling

N.B. I believe Quarkus doesn't expose this property currently; it can be set via https://quarkus.io/guides/hibernate-orm#quarkus-hibernate-orm_quarkus.hibernate-orm.unsupported-properties-full-property-key

This way you can keep using transactions and not worry about each one of them to hold a physical connection for a long time.

That said, we should also:

figure out why it's leaking in your case (of course it shouldn't)
decide if we should make the transactionality checks stricter (more below)
Based on your example, perhaps we should reconsider switching the default handling_mode that Quarkus is setting.

But I firmly believe it would be much better if developers could decide if they are ok taking that risk.

I would generally agree, but not by default. I would prefer having strict checks so that people would notice - as most of the times not using a transaction is not intentional as it's a bad idea; I'm ok to offer a flag that disables such checks as I agree that occasionally some power user might know better. In fact such flag already exists but it's not documented.

Aug 10 '22 10:08 Sanne

@Sanne Thanks for such a great explanation. I've never heard about this flag you mentioned, and I'll certainly read the reference. You guys are doing amazing things! :heart:

Aug 10 '22 18:08 OleKsimov

I've checked this option today:

hibernate.connection.handling_mode=DELAYED_ACQUISITION_AND_RELEASE_AFTER_STATEMENT

with different values, and even tried to set release_mode with different strategies. Looks like none of these options is checked by Quarkus: PhysicalConnectionHandlingMode is not changed and always the same in runtime:

DELAYED_ACQUISITION_AND_RELEASE_BEFORE_TRANSACTION_COMPLETION

Aug 26 '22 15:08 OleKsimov

It is actually checked by Quarkus, but ignored after all:

WARN: Persistence-unit [<default>] sets property 'hibernate.connection.handling_mode' to a custom value through 'quarkus.hibernate-orm.unsupported-properties."hibernate.connection.handling_mode"', but Quarkus already set that property independently. The custom value will be ignored.

Aug 26 '22 15:08 OleKsimov

Honestly I doubt this would change anything even if Quarkus propagated the value to Hibernate ORM. Under the hood, Agroal keeps one connection per transaction no matter what. See https://hibernate.zulipchat.com/#narrow/stream/132094-hibernate-orm-dev/topic/Good.20default.20for.20hibernate.2Econnection.2Ehandling_mode .

Conclusion from that discussion: if we wanted DELAYED_ACQUISITION_AND_RELEASE_AFTER_STATEMENT to work properly with Agroal, we would need Agroal to:

suspend the transaction and release the "physical" connection when Hibernate ORM releases the "logical" connection
re-acquire a "physical" connection and restore the transaction when Hibernate ORM re-acquires a connection during the same transaction

@barreiro I didn't get why that was a bad idea during our conversation, do you think you could explain that? I get that the current behavior is better in many aspects, but I'm wondering what's the show-stopper with the behavior described above, which would enable much higher concurrency for applications with long-running transactions.

Aug 29 '22 06:08 yrodiere

Thanks for the great reproducer - I finally had some time to dive into it, apologies it took some time.

I believe we've indeed found a leak in Hibernate ORM; it seems the path for lazy-loading triggered on an enhanced entity, on an existing open session but with lack of active transaction is not defensive enough in handling the JdbcCoordinator.

The workaround is indeed to use a Transaction.

Interestinly enough this particular bug is also bypassing a check for testing if it's allowed to initiate such a load operation outside of a transaction; had this worked as intended, you'd have had an exception rather than a leaked connection; so I'm unsure about the fix we want - I'll need to discuss this with the other team members.

https://github.com/hibernate/hibernate-orm/pull/5249

Aug 31 '22 22:08 Sanne

I was wrong yesterday, ORM isn't leaking in practice as there's a second line of defence - need to keep looking.

Sep 01 '22 11:09 Sanne

We'll need to check why connections aren't released w/o a Transaction - I suspect some scoping issue.

It turns out my very initial thought ^ was actually spot on. What is happening is that the Hibernate Session is being opened, but it's never being closed. Normally if you'd have had a Transaction that would have also terminated the connection, but in lack of that the lack of closing the Session becomes critical.

When not using a transaction, the lifecycle of the Session is bound to the CDI RequestScope.

If there is no Request Scope you'd get an exception; in this particular case a request scope is being setup automatically on each incoming Kafka message , and somewhere in between these scopes are not being terminated, although new ones are being installed on each incoming message - this might be an effect of the testing infrastructure though; what's happening there isn't entirely clear to me and I'll need to get help from the CDI or Kafka experts.

Sep 02 '22 15:09 Sanne

This was fixed now by #27802 - bear in mind that you might need to use @ActivateRequestContext going forward, as an incoming message listener won't have the scope by default: we decided it's better to make it explicit, so that there's a well defined beginning of the scope - and more importantly, you get to choose where it ends.

Sep 09 '22 16:09 Sanne

quarkus quarkus copied to clipboard

Connection pool exhaustion for lazy entity association (acquisition timeout)

Describe the bug

Expected behavior

Actual behavior

How to Reproduce?

Output of uname -a or ver

Output of java -version

GraalVM version (if different from Java)

Quarkus version or git rev

Build tool (ie. output of mvnw --version or gradlew --version)

Additional information

quarkus
quarkus copied to clipboard

Output of `uname -a` or `ver`

Output of `java -version`

Build tool (ie. output of `mvnw --version` or `gradlew --version`)