scio icon indicating copy to clipboard operation
scio copied to clipboard

JDBC IO: pipeline gets stuck on attempt to write to Postgres

Open stormy-ua opened this issue 2 years ago • 2 comments

There is a pipeline which has been consistently getting stuck on attempt to write to JDBC. The thread dump on one worker revealed a bunch of threads waiting for a new connection to be allocated:

   java.lang.Thread.State: WAITING
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for <45ac842e> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
        at org.apache.commons.pool2.impl.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:581)
        at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:437)
        at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:354)
        at org.apache.commons.dbcp2.PoolingDataSource.getConnection(PoolingDataSource.java:134)
        at org.apache.commons.dbcp2.BasicDataSource.getConnection(BasicDataSource.java:734)
        at org.apache.beam.sdk.io.jdbc.JdbcIO$WriteVoid$WriteFn.executeBatch(JdbcIO.java:1449)
        at org.apache.beam.sdk.io.jdbc.JdbcIO$WriteVoid$WriteFn.processElement(JdbcIO.java:1398)
        at org.apache.beam.sdk.io.jdbc.JdbcIO$WriteVoid$WriteFn$DoFnInvoker.invokeProcessElement(Unknown Source)

There were 11 such threads waiting for a new connection from a pool. Other workers were idle. It looks like this single worker was holding the watermark back and the pipeline stopped making any progress and appeared as stuck. The default maximum number of connections is 8 according to this and beam neither overrides nor exposes it as a separate config for bumping it. In its turn scio doesn't support this as well. There is a break-glass approach how to configure it and was referenced in the BEAM-9629.

This work should be also done together with an investigation into why DB connections aren't reused. Does a failed batch leaks a DB connection and it is never returned to the pool?

stormy-ua avatar Oct 15 '21 22:10 stormy-ua

Beam issue to expose max connections in pool as a setting - https://issues.apache.org/jira/browse/BEAM-13261

stormy-ua avatar Nov 16 '21 15:11 stormy-ua

Moving to 0.12.1 as this is not yet fixed in beam

kellen avatar Jul 06 '22 17:07 kellen