Revisit usage of database connection pooling

Open nscuro opened this issue 3 years ago • 0 comments

Current Behavior:

While experimenting with persistence metrics exposition, I noticed that there are more database connections (and even more connection pools) than I expected:

hikaricp_connections_idle{pool="HikariPool-3",} 9.0
hikaricp_connections_idle{pool="HikariPool-4",} 11.0
hikaricp_connections_max{pool="HikariPool-3",} 20.0
hikaricp_connections_max{pool="HikariPool-4",} 20.0
hikaricp_connections_min{pool="HikariPool-3",} 10.0
hikaricp_connections_min{pool="HikariPool-4",} 10.0
hikaricp_connections{pool="HikariPool-3",} 10.0
hikaricp_connections{pool="HikariPool-4",} 11.0

This was also reflected in the database itself, where I could see 21 active JDBC connections originating from DT.

According to the DataNucleus documentation:

Datastore connections are obtained from up to 2 connection factories. The primary connection factory is used for persistence operations, and optionally for value generation operations. The secondary connection factory is used for schema generation, and optionally for value generation operations.

If not specified otherwise, the configured connection pool will thus be created twice, for DN's primary and secondary connection factory. That means that whatever we configure the pool size to be via Alpine, in reality the number will be doubled. Which may explain one or the other connections issue we had reported to us in the past. This behavior is unexpected from the user perspective.

Similarly, notice how the metrics above say HikariPool-3 and HikariPool-4. This is because two other instances were created temporarily by the upgrade framework:

https://github.com/DependencyTrack/dependency-track/blob/e9304da3beba4776784da9104edcefbf6da0b32f/src/main/java/org/dependencytrack/upgrade/UpgradeInitializer.java#L66

This temporarily spins up two connection pools with (per default) 10 idle connections each. Because upgrades are executed in a serial fashion, a connection pool may be a little overkill. A single connection should probably suffice in this case.

Proposed Behavior

There are not many big OSS projects using DataNucleus, but Apache Hive is one of them. I looked into how they handle the connection pool situation, and they settled for using a smaller, fixed-size connection pool for DN's secondary connection factory: https://github.com/hsnusonic/hive/blob/714c260e4a7c6b147c897718a33e693699267792/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PersistenceManagerProvider.java#L256-L259

We should test whether we can do something similar to limit the number of connections we hoard. This will need to be tested in high-load situations, to ensure that it doesn't slow down the system.

Additionally, the upgrade framework should be adjusted to not use a connection pool.

Sep 07 '22 20:09 nscuro