snowplow-rdb-loader icon indicating copy to clipboard operation
snowplow-rdb-loader copied to clipboard

RDB Loader: fix serializable isolation violation errors

Open chuwy opened this issue 4 years ago • 1 comments

We have tens of occurences of this error for different clients:

com.snowplowanalytics.snowplow.rdbloader.LoaderError$StorageTargetError: Database error: [Amazon](500310) Invalid operation: 1023 
Details: Serializable isolation violation on table - 100167, transactions forming the cycle are: 55705701, 55705754 (pid:16606);
    at com.snowplowanalytics.snowplow.rdbloader.dsl.JDBC$$anon$1.$anonfun$executeUpdate$1(JDBC.scala:165)
    at raiseError @ retry.package$.retry$package$$retryingOnSomeErrorsImpl(package.scala:81)
    at map @ com.snowplowanalytics.iglu.client.Client$.parseDefault(Client.scala:56)
    at tailRecM @ retry.package$RetryingOnSomeErrorsPartiallyApplied.apply(package.scala:96)
    at blockOn$extension @ doobie.free.KleisliInterpreter.$anonfun$primitive$1(kleisliinterpreter.scala:112)
    at $anonfun$tailRecM$1 @ doobie.util.transactor$Transactor$$anon$4.$anonfun$apply$4(transactor.scala:167)
    at tailRecM @ retry.package$RetryingOnSomeErrorsPartiallyApplied.apply(package.scala:96)
    at tailRecM @ retry.package$RetryingOnSomeErrorsPartiallyApplied.apply(package.scala:96)

To me it looks like Loader's COPY transaction races with another client's transaction. Or maybe our retry mechanism somehow runs parallel transaction? If former case - we can handle it with a plain failure handler, retrying the whole loading a bit later.

chuwy avatar Aug 31 '21 14:08 chuwy

Found the same (or very similar) issue in our QA pipeline that doesn't have any client transactions. What's interesting is that there's a lot of these exceptions and the pipeline is very tiny volume.

Exception(most recent call first)

com.amazon.support.exceptions.ErrorException: [Amazon](500310) Invalid operation: 1023 
Details: Serializable isolation violation on table - 100167, transactions forming the cycle are: 55705701, 55705754 (pid:16606);
    at com.amazon.redshift.client.messages.inbound.ErrorResponse.toErrorException
    at com.amazon.redshift.client.PGMessagingContext.handleErrorResponse
    at com.amazon.redshift.client.PGMessagingContext.handleMessage
    at com.amazon.jdbc.communications.InboundMessagesPipeline.getNextMessageOfClass
    at com.amazon.redshift.client.PGMessagingContext.doMoveToNextClass
    at com.amazon.redshift.client.PGMessagingContext.getErrorResponse
    at com.amazon.redshift.client.PGClient.handleErrorsScenario2ForPrepareExecution
    at com.amazon.redshift.client.PGClient.handleErrorsPrepareExecute
    at com.amazon.redshift.dataengine.CallablePreparedOrAtomicExecuteTask.call
    at com.amazon.redshift.dataengine.CallablePreparedOrAtomicExecuteTask.call
    at java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)

java.sql.SQLException: [Amazon](500310) Invalid operation: 1023 
Details: Serializable isolation violation on table - 100167, transactions forming the cycle are: 55705701, 55705754 (pid:16606);
    at com.amazon.redshift.client.messages.inbound.ErrorResponse.toErrorException
    at com.amazon.redshift.client.PGMessagingContext.handleErrorResponse
    at com.amazon.redshift.client.PGMessagingContext.handleMessage
    at com.amazon.jdbc.communications.InboundMessagesPipeline.getNextMessageOfClass
    at com.amazon.redshift.client.PGMessagingContext.doMoveToNextClass
    at com.amazon.redshift.client.PGMessagingContext.getErrorResponse
    at com.amazon.redshift.client.PGClient.handleErrorsScenario2ForPrepareExecution
    at com.amazon.redshift.client.PGClient.handleErrorsPrepareExecute
    at com.amazon.redshift.dataengine.CallablePreparedOrAtomicExecuteTask.call
    at com.amazon.redshift.dataengine.CallablePreparedOrAtomicExecuteTask.call
    at java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at blockOn$extension @ doobie.free.KleisliInterpreter.$anonfun$primitive$1(kleisliinterpreter.scala:112)
    at $anonfun$tailRecM$1 @ doobie.util.transactor$Transactor$$anon$4.$anonfun$apply$4(transactor.scala:167)
    at tailRecM @ retry.package$RetryingOnSomeErrorsPartiallyApplied.apply(package.scala:96)
    at $anonfun$tailRecM$1 @ doobie.free.KleisliInterpreter$ConnectionInterpreter.$anonfun$bracketCase$28(kleisliinterpreter.scala:750)
    at tailRecM @ retry.package$RetryingOnSomeErrorsPartiallyApplied.apply(package.scala:96)
    at $anonfun$tailRecM$1 @ doobie.util.transactor$Transactor$$anon$4.$anonfun$apply$4(transactor.scala:167)
    at tailRecM @ retry.package$RetryingOnSomeErrorsPartiallyApplied.apply(package.scala:96)
    at $anonfun$tailRecM$1 @ doobie.free.KleisliInterpreter$ConnectionInterpreter.$anonfun$bracketCase$28(kleisliinterpreter.scala:750)
    at tailRecM @ retry.package$RetryingOnSomeErrorsPartiallyApplied.apply(package.scala:96)
    at $anonfun$tailRecM$1 @ doobie.util.transactor$Transactor$$anon$4.$anonfun$apply$4(transactor.scala:167)
    at tailRecM @ retry.package$RetryingOnSomeErrorsPartiallyApplied.apply(package.scala:96)
    at $anonfun$tailRecM$1 @ doobie.free.KleisliInterpreter$ConnectionInterpreter.$anonfun$bracketCase$28(kleisliinterpreter.scala:750)
    at tailRecM @ retry.package$RetryingOnSomeErrorsPartiallyApplied.apply(package.scala:96)

chuwy avatar Aug 31 '21 14:08 chuwy