spring-batch
spring-batch copied to clipboard
OptimisticLockingFailureException updating step execution after commit failure
Quinton McCombs opened BATCH-1767 and commented
It appears that if the commit fails, spring batch will get an OptimisticLockingFailureException when it tries to revert the changes to the step execution. In my particular case, I have a callback through EclipseLink to update history tables before a transaction is committed. If a failure occurs during this callback, the commit fails.
From looking through the code and the attached log file, the step execution is updated and committed before the main transaction is committed. When the commit fails, the old values for the step execution (including version) are updated to the values before the chuck started. When control returns to AbstractStep.execute(), the OptimisticLockingFailureException is thrown when the step execution is updated with the failed status because the new version had already been committed to the database.
2011-07-06 17:40:29,494 ERROR SimpleAsyncTaskExecutor-1 [org.springframework.batch.core.step.AbstractStep] Encountered an error saving batch meta data. This job is now in an unknown state and should not be restarted.
org.springframework.dao.OptimisticLockingFailureException: Attempt to update step execution id=3225 with wrong version (35), where current version is 36
at org.springframework.batch.core.repository.dao.JdbcStepExecutionDao.updateStepExecution(JdbcStepExecutionDao.java:185)
at org.springframework.batch.core.repository.support.SimpleJobRepository.update(SimpleJobRepository.java:171)
at sun.reflect.GeneratedMethodAccessor130.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:309)
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:110)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202)
at $Proxy77.update(Unknown Source)
at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:244)
at org.springframework.batch.core.job.SimpleStepHandler.handleStep(SimpleStepHandler.java:135)
at org.springframework.batch.core.job.flow.JobFlowExecutor.executeStep(JobFlowExecutor.java:61)
at org.springframework.batch.core.job.flow.support.state.StepState.handle(StepState.java:60)
at org.springframework.batch.core.job.flow.support.SimpleFlow.resume(SimpleFlow.java:144)
at org.springframework.batch.core.job.flow.support.SimpleFlow.start(SimpleFlow.java:124)
at org.springframework.batch.core.job.flow.support.state.SplitState$1.call(SplitState.java:91)
at org.springframework.batch.core.job.flow.support.state.SplitState$1.call(SplitState.java:89)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.lang.Thread.run(Thread.java:662)
Affects: 2.1.7
Attachments:
- BATCH-1767.log.gz (16.78 kB)
- log-1.txt (43.41 kB)
- scenario.txt (2.96 kB)
- spring-batch-bug.zip (11.15 kB)
Referenced from: pull request https://github.com/spring-projects/spring-batch/pull/591
23 votes, 31 watchers
Dave Syer commented
You need to flush JPA changes before Spring Batch tries to commit its own meta data. The best way to do that is in the ItemWriter. Any reason you can't do that?
Quinton McCombs commented
We are already using the JpaWriter supplied with Spring Batch which performs the flush. We have a requirement to audit all database changes and identify the changes made during a single transaction. For this reason, we implemented our auditing using an Eclipselink pre-commit hook (Hibernate can do this as well). While the transaction level auditing does not make much sense for a batch job, it is what is used by the rest of the application.
Dave Syer commented
If you are using transaction hooks you are going to have to tread very carefully. You need to somehow ensure that the vendor hook runs before the Spring transaction synchronization, or else not use it. The easiest way would be to run your audit code in an ItemWriter or ItemWriteListener, but if you find a cleverer way and it can be exposed as a framework feature, please let us know. If you can provide a sample project (preferably self-contained, with an in-memory database, and built using maven, but we'll take whatever you have), that would help. Or if you want to make a suggestion about a patch you can fork on github and send a pull request (the README has more information).
Ramkumar Kirhsnan commented
We've also got the same failure after the one of the steps failed to commit due to database exception thrown while writing into the database. Following is the stack trace when i tried to run the job for only one item
Jul 13 10:45:30,467 INFO populateAccountSnapshotReportJob [SimpleAsyncTaskExecutor-2] [org.springframework.batch.core.job.SimpleStepHandler] 133 - Executing step: [processAccountsStep]
Jul 13 10:45:30,514 DEBUG populateAccountSnapshotReportJob [SimpleAsyncTaskExecutor-2] [com.om.dh.batch.core.partition.KeySetPartitioner] 79 - partition: partition-1, keys=[1000000069567]
Jul 13 10:45:30,576 DEBUG populateAccountSnapshotReportJob [workerThreadPool-1-1] [com.om.dh.batch.core.partition.PartitionKeySetPagingItemReader] 104 - Reading page 0
Jul 13 10:45:30,576 DEBUG populateAccountSnapshotReportJob [workerThreadPool-1-1] [com.om.dh.batch.item.adapter.AbstractDelegatingAdapter] 60 - processing started for 1000000069567
Jul 13 10:45:31,217 DEBUG populateAccountSnapshotReportJob [workerThreadPool-1-1] [com.om.dh.batch.item.adapter.AbstractDelegatingAdapter] 103 - processing complete for 1000000069567 elapsed=641ms
Jul 13 10:45:31,279 ERROR populateAccountSnapshotReportJob [workerThreadPool-1-1] [org.hibernate.util.JDBCExceptionReporter] 465 - Data truncation: Data too long for column 'calculation_details' at row 1
Jul 13 10:45:31,279 ERROR populateAccountSnapshotReportJob [workerThreadPool-1-1] [org.hibernate.event.def.AbstractFlushingEventListener] 543 - Could not synchronize database state with session
org.hibernate.exception.DataException: Could not execute JDBC batch update
at org.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:100)
at org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:66)
at org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:275)
at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:266)
at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:167)
at org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:321)
at org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:50)
at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1027)
at org.hibernate.impl.SessionImpl.managedFlush(SessionImpl.java:365)
at org.hibernate.transaction.JDBCTransaction.commit(JDBCTransaction.java:137)
at org.springframework.orm.hibernate3.HibernateTransactionManager.doCommit(HibernateTransactionManager.java:655)
at org.springframework.transaction.support.AbstractPlatformTransactionManager.processCommit(AbstractPlatformTransactionManager.java:732)
at org.springframework.transaction.support.AbstractPlatformTransactionManager.commit(AbstractPlatformTransactionManager.java:701)
at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:140)
at org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext(TaskletStep.java:263)
at org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration(StepContextRepeatCallback.java:76)
at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult(RepeatTemplate.java:367)
at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:214)
at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:143)
at org.springframework.batch.core.step.tasklet.TaskletStep.doExecute(TaskletStep.java:248)
at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:195)
at org.springframework.batch.core.partition.support.TaskExecutorPartitionHandler$1.call(TaskExecutorPartitionHandler.java:107)
at org.springframework.batch.core.partition.support.TaskExecutorPartitionHandler$1.call(TaskExecutorPartitionHandler.java:1)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.sql.BatchUpdateException: Data truncation: Data too long for column 'calculation_details' at row 1
at com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1666)
at com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1082)
at com.mysql.jdbc.jdbc2.optional.StatementWrapper.executeBatch(StatementWrapper.java:719)
at org.jboss.resource.adapter.jdbc.CachedPreparedStatement.executeBatch(CachedPreparedStatement.java:476)
at org.jboss.resource.adapter.jdbc.WrappedStatement.executeBatch(WrappedStatement.java:774)
at org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:70)
at org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:268)
... 25 more
Jul 13 10:45:31,310 INFO populateAccountSnapshotReportJob [workerThreadPool-1-1] [org.springframework.batch.core.step.tasklet.TaskletStep] 327 - Commit failed while step execution data was already updated. Reverting to old version.
Jul 13 10:45:31,357 ERROR populateAccountSnapshotReportJob [workerThreadPool-1-1] [org.springframework.batch.core.step.tasklet.TaskletStep] 423 - JobRepository failure forcing exit with unknown status
org.springframework.dao.OptimisticLockingFailureException: Attempt to update step execution id=671991 with wrong version (1), where current version is 2
at org.springframework.batch.core.repository.dao.JdbcStepExecutionDao.updateStepExecution(JdbcStepExecutionDao.java:185)
at org.springframework.batch.core.repository.support.SimpleJobRepository.update(SimpleJobRepository.java:171)
at sun.reflect.GeneratedMethodAccessor143.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307)
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149)
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:106)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
at $Proxy77.update(Unknown Source)
at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:417)
at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:128)
at org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext(TaskletStep.java:263)
at org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration(StepContextRepeatCallback.java:76)
at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult(RepeatTemplate.java:367)
at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:214)
at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:143)
at org.springframework.batch.core.step.tasklet.TaskletStep.doExecute(TaskletStep.java:248)
at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:195)
at org.springframework.batch.core.partition.support.TaskExecutorPartitionHandler$1.call(TaskExecutorPartitionHandler.java:107)
at org.springframework.batch.core.partition.support.TaskExecutorPartitionHandler$1.call(TaskExecutorPartitionHandler.java:1)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Jul 13 10:45:31,357 INFO populateAccountSnapshotReportJob [workerThreadPool-1-1] [org.springframework.batch.core.step.tasklet.TaskletStep] 327 - Commit failed while step execution data was already updated. Reverting to old version.
Jul 13 10:45:31,357 ERROR populateAccountSnapshotReportJob [workerThreadPool-1-1] [org.springframework.batch.core.step.AbstractStep] 212 - Encountered an error executing the step
org.springframework.batch.core.step.FatalStepExecutionException: JobRepository failure forcing exit with unknown status
at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:426)
at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:128)
at org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext(TaskletStep.java:263)
at org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration(StepContextRepeatCallback.java:76)
at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult(RepeatTemplate.java:367)
at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:214)
at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:143)
at org.springframework.batch.core.step.tasklet.TaskletStep.doExecute(TaskletStep.java:248)
at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:195)
at org.springframework.batch.core.partition.support.TaskExecutorPartitionHandler$1.call(TaskExecutorPartitionHandler.java:107)
at org.springframework.batch.core.partition.support.TaskExecutorPartitionHandler$1.call(TaskExecutorPartitionHandler.java:1)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.springframework.dao.OptimisticLockingFailureException: Attempt to update step execution id=671991 with wrong version (1), where current version is 2
at org.springframework.batch.core.repository.dao.JdbcStepExecutionDao.updateStepExecution(JdbcStepExecutionDao.java:185)
at org.springframework.batch.core.repository.support.SimpleJobRepository.update(SimpleJobRepository.java:171)
at sun.reflect.GeneratedMethodAccessor143.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307)
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149)
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:106)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
at $Proxy77.update(Unknown Source)
at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:417)
... 15 more
Jul 13 10:45:31,404 ERROR populateAccountSnapshotReportJob [workerThreadPool-1-1] [org.springframework.batch.core.step.AbstractStep] 250 - Encountered an error saving batch meta data. This job is now in an unknown state and should not be restarted.
org.springframework.dao.OptimisticLockingFailureException: Attempt to update step execution id=671991 with wrong version (1), where current version is 2
at org.springframework.batch.core.repository.dao.JdbcStepExecutionDao.updateStepExecution(JdbcStepExecutionDao.java:185)
at org.springframework.batch.core.repository.support.SimpleJobRepository.update(SimpleJobRepository.java:171)
at sun.reflect.GeneratedMethodAccessor143.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307)
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149)
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:106)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
at $Proxy77.update(Unknown Source)
at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:244)
at org.springframework.batch.core.partition.support.TaskExecutorPartitionHandler$1.call(TaskExecutorPartitionHandler.java:107)
at org.springframework.batch.core.partition.support.TaskExecutorPartitionHandler$1.call(TaskExecutorPartitionHandler.java:1)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Jul 13 10:45:31,404 ERROR populateAccountSnapshotReportJob [SimpleAsyncTaskExecutor-2] [org.springframework.batch.core.step.AbstractStep] 212 - Encountered an error executing the step
org.springframework.batch.core.JobExecutionException: Partition handler returned an unsuccessful step
at org.springframework.batch.core.partition.support.PartitionStep.doExecute(PartitionStep.java:110)
at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:195)
at org.springframework.batch.core.job.SimpleStepHandler.handleStep(SimpleStepHandler.java:135)
at org.springframework.batch.core.job.flow.JobFlowExecutor.executeStep(JobFlowExecutor.java:61)
at org.springframework.batch.core.job.flow.support.state.StepState.handle(StepState.java:60)
at org.springframework.batch.core.job.flow.support.SimpleFlow.resume(SimpleFlow.java:144)
at org.springframework.batch.core.job.flow.support.SimpleFlow.start(SimpleFlow.java:124)
at org.springframework.batch.core.job.flow.FlowJob.doExecute(FlowJob.java:135)
at org.springframework.batch.core.job.AbstractJob.execute(AbstractJob.java:281)
at com.om.dh.batch.service.impl.JobExecutionServiceImpl.createAndExecuteJobRunnable(JobExecutionServiceImpl.java:361)
at com.om.dh.batch.service.impl.JobExecutionServiceImpl.doExecuteJob(JobExecutionServiceImpl.java:325)
at com.om.dh.batch.service.impl.JobExecutionServiceImpl.access$000(JobExecutionServiceImpl.java:51)
at com.om.dh.batch.service.impl.JobExecutionServiceImpl$1.run(JobExecutionServiceImpl.java:267)
at java.lang.Thread.run(Thread.java:619)
Dave Syer commented
I think that problem is unavoidable if you don't flush the Hibernate session in your ItemWriter. You have made life complicated by using a partition step, but the same failure mode is easy to see in a single threaded normal step. Can you just do the flush?
Kristof Buts commented
In our current project, we are also experiencing this problem in some of our batches, but until yesterday we couldn't reproduce it. After two days trying to reproduce the problem, I finally managed to do it. I already uploaded my little (maven) project for reproducing the bug.
So, what did I figure out? Consider the following scenario for a batch running 3 threads:
Thread # | line | What is going on? | oldVersion.version | stepExecution.version | DB version | comments
th#1 | 380 | copy(stepExecution, oldVersion); | 1 | 1 | 1 |
th#2 | 380 | copy(stepExecution, oldVersion); | 1 | 1 | 1 |
th#3 | 380 | copy(stepExecution, oldVersion); | 1 | 1 | 1 |
th#1 | 429 | getJobRepository().updateExecutionContext(stepExecution); | 1 | 1 | will become 2 | step execution to be committed
th#1 | 430 | stepExecution.incrementCommitCount(); | 1 | 2 | will become 2 |
th#1 | / | transaction is commited | 1 | 2 | 2 |
th#2 | 429 | getJobRepository().updateExecutionContext(stepExecution); | 1 | 2 | will become 3 | step execution to be committed
th#2 | 430 | stepExecution.incrementCommitCount(); | 1 | 3 | will become 3 |
th#2 | / | transaction is rolled back | 1 | 2 | 2 | for some reason tx is rolled back
th#2 | 344 | copy(oldVersion, stepExecution); | 1 | 1 | 2 | step version is reverted to 1 -> WTF?
th#3 | 429 | getJobRepository().updateExecutionContext(stepExecution); | 1 | 1 | 2 | OptimisticLockException will be thrown
note:
-the line numbers are the ones of class org.springframework.batch.core.step.tasklet.TaskletStep and its inner class ChunkTransactionCallback
-the oldVersion.version is the one which is assigned in the current thread. Since I assume all 3 threads to first go through this piece of code, they will be 1 for all 3 threads in this case. At least thread number 2 should have value 1 for his old version to produce the bug. In that case the wrong version (1 instead of 2) will be reverted later on.
-the rollback of transaction 2 can be caused by:
-a timeout
-some trigger that still needs to be executed at commit-time
-deferred db constraint checks
-execution of some sql statements that were not flushed yet
-own registered transaction synchronization that fails
-...
-For the bug to occur, it is important that the rollback is triggered after line 422 ((stepExecutionUpdated = true), since that is the line which causes the TransactionSynchronization method to revert the stepExecution object (see if-statement at line 339 - if (stepExecutionUpdated) )
So, the problem is that copying the oldVersion variable is done outside the synchronized block. By putting it inside the block (say at line 407 instead of line 380, just after setting lock to true), I think the bug would be solved (I already tested this by debugging my test with a conditional breakpoint in eclipse which called this code at the right place and it worked).
Is it possible to get this fix in a patched version for spring-batch-core-2.1.8.RELEASE? We cannot upgrade to a higher spring version since we are only one month away from a big release.
Kind regards, Kristof
Philippe Mouawad commented
Hello, I am facing the same issue with Spring Batch 2.1.8 and 2.1.9.
Is there some plan to take into account what Kristof Buts.
Stacktrace:
threadPoolTaskExecutorForBatch-12] 12:07:10,363 ERROR org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:438) - JobRepository failure forcing exit with unknown status
org.springframework.dao.OptimisticLockingFailureException: Attempt to update step execution id=8 with wrong version (217), where current version is 223
at org.springframework.batch.core.repository.dao.JdbcStepExecutionDao.updateStepExecution(JdbcStepExecutionDao.java:185)
at org.springframework.batch.core.repository.support.SimpleJobRepository.update(SimpleJobRepository.java:171)
at sun.reflect.GeneratedMethodAccessor802.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:309)
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:110)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202)
at com.sun.proxy.$Proxy48.update(Unknown Source)
at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:432)
at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:130)
at org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext(TaskletStep.java:264)
at org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration(StepContextRepeatCallback.java:76)
at org.springframework.batch.repeat.support.TaskExecutorRepeatTemplate$ExecutingRunnable.run(TaskExecutorRepeatTemplate.java:258)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)
[threadPoolTaskExecutorForBatch-12] 12:07:10,366 ERROR com.xxx.dpas.log.Logger.error(Logger.java:89) - ----- doROLLBACK Transaction -----
[threadPoolTaskExecutorForBatch-12] 12:07:10,367 ERROR com.xxx.dpas.log.Logger.error(Logger.java:89) - ----- doCLEANUP Transaction -----
[threadPoolTaskExecutorForBatch-13] 12:07:10,372 ERROR org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:438) - JobRepository failure forcing exit with unknown status
org.springframework.dao.OptimisticLockingFailureException: Attempt to update step execution id=8 with wrong version (221), where current version is 223
at org.springframework.batch.core.repository.dao.JdbcStepExecutionDao.updateStepExecution(JdbcStepExecutionDao.java:185)
at org.springframework.batch.core.repository.support.SimpleJobRepository.update(SimpleJobRepository.java:171)
at sun.reflect.GeneratedMethodAccessor802.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:309)
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:110)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202)
at com.sun.proxy.$Proxy48.update(Unknown Source)
at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:432)
at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:130)
at org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext(TaskletStep.java:264)
at org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration(StepContextRepeatCallback.java:76)
at org.springframework.batch.repeat.support.TaskExecutorRepeatTemplate$ExecutingRunnable.run(TaskExecutorRepeatTemplate.java:258)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)
Philippe Mouawad commented
Please find attached the spring batch classes put in INFO and DEBUG fororg.springframework.batch.core.step.tasklet.
As you can see before failure we have this: [threadPoolTaskExecutorForBatch-2] 15:30:46,282 INFO org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.afterCompletion(TaskletStep.java:342) - Commit failed while step execution data was already updated. Reverting to old version.
Philippe Mouawad commented
Hello, Please find pull request here: https://github.com/SpringSource/spring-batch/pull/214
Hope you can take it into account soon. Regards Philippe
Philippe Mouawad commented
Hello, After further analysis, I don't understand why copy method when restoring also copies the version number of oldVersion to stepExecution ?
Could you shed some light on this ?
Thank you Regards
Michael Minella commented
I added a comment about the pull request. In short, moving the copy to where it is won't work because the StepExecution could have already been modified in the step. There are a couple tests that are failing with that update that illustrate this.
With regards to why the copy method restores the version number...why wouldn't it? The intent is that there was a rollback so the state should match what was there before the rollback.
Philippe Mouawad commented
Thanks for feedback. Further testing this morning on real world project showed me fix did not correct all cases, my real world case is still broken although attached test case was fixed by it.
Does it mean you confirm this bug is a real one ? Test project attached reproduces it and my real world also does. Is there some plan to fix it ? within which delay ?
It seems anyway that the stepexecution is modified somewhere before restoring which generates the case.
Kristof Buts commented
@Michael Minella,
I'm afraid there is another (hidden) issue with this code:
When running a batch multithreaded, one thread can commit stuff from another thread which can be rollbacked at the end. Shouldn't changes to the stepExecution object be recorded and executed against the actual stepExecution object at commit time instead of instantly applying to the real StepExecution object and reverting on rollback afterwards (just wondering) ?
Gérald Quintana commented
We has this OptimisticLockingFailureException, when we noticed this log just before: "Commit failed while step execution data was already updated. Reverting to old version." . Turning org.springframework.batch.core.step.tasklet and org.springframework.transaction loggers level to debug helped us find a bug in our tasklet code.
In our case:
- The tasklet produces wrong data but the chunk is property processed and no exception is raised here
- Chunk transaction commit triggers a deferred database constraint which prevents commit (it may also occur with JPA where transaction commit triggers flushing)
- This exception is not logged (maybe because of the RetryTemplate even if retry is not configured)
- The TransactionManager decides to call doRollbackOnCommitException
- The ChunkTransactionCallback transaction synchronization is called with status=ROLLBACKED
zyro commented
i think i am currently facing this (or at least sth. similar) in the following scenario:
- business logic throws an OptimisticLockingFailureException during commit marking the current transaction rollback-only
- jdbcTemplate update count is 0 during step execution update (does this happen if current tx is rollback-only?)
- https://github.com/spring-projects/spring-batch/blob/master/spring-batch-core/src/main/java/org/springframework/batch/core/repository/dao/JdbcStepExecutionDao.java#L252
- a little bit misleading exception is thrown: Attempt to update step execution id=123 with wrong version (2), where current version is 1
- https://github.com/spring-projects/spring-batch/blob/master/spring-batch-core/src/main/java/org/springframework/batch/core/repository/dao/JdbcStepExecutionDao.java#L255
- stepExecution is stuck with exit status EXECUTING and jobExecution is stuck with status UNKNOWN and exit status UNKNOWN
member sound commented
Is this issue still active? I'm also getting the OptimisticLockFailureException. The cause for me was an insert on a column whose length was not big enough for the inserted data, thus throwing SQL Exception and thereby the rollback:
WARN org.hibernate.engine.jdbc.spi.SqlExceptionHelper: SQL Error: 0, SQLState: 22001
ERROR org.hibernate.engine.jdbc.spi.SqlExceptionHelper: FEHLER: Wert zu lang für Typ character varying(2)
INFO org.hibernate.engine.jdbc.batch.internal.AbstractBatchImpl: HHH000010: On release of batch it still contained JDBC statements
ERROR org.springframework.batch.core.step.tasklet.TaskletStep: JobRepository failure forcing rollback
org.springframework.dao.OptimisticLockingFailureException: Attempt to update step execution id=29 with wrong version (1), where current version is 2
at org.springframework.batch.core.repository.dao.MapStepExecutionDao.updateStepExecution(MapStepExecutionDao.java:102) ~[spring-batch-core-3.0.2.RELEASE.jar:3.0.2.RELEASE]
at org.springframework.batch.core.repository.support.SimpleJobRepository.update(SimpleJobRepository.java:191) ~[spring-batch-core-3.0.2.RELEASE.jar:3.0.2.RELEASE]
Caused by: org.springframework.dao.OptimisticLockingFailureException: Attempt to update step execution id=29 with wrong version (1), where current version is 2
at org.springframework.batch.core.repository.dao.MapStepExecutionDao.updateStepExecution(MapStepExecutionDao.java:102) ~[spring-batch-core-3.0.2.RELEASE.jar:3.0.2.RELEASE]
at org.springframework.batch.core.repository.support.SimpleJobRepository.update(SimpleJobRepository.java:191) ~[spring-batch-core-3.0.2.RELEASE.jar:3.0.2.RELEASE]
ERROR org.springframework.batch.core.step.AbstractStep: Encountered an error saving batch meta data for step step in job myJobConfig. This job is now in an unknown state and should not be restarted.
org.springframework.dao.OptimisticLockingFailureException: Attempt to update step execution id=29 with wrong version (1), where current version is 2
at org.springframework.batch.core.repository.dao.MapStepExecutionDao.updateStepExecution(MapStepExecutionDao.java:102) ~[spring-batch-core-3.0.2.RELEASE.jar:3.0.2.RELEASE]
at org.springframework.batch.core.repository.support.SimpleJobRepository.update(SimpleJobRepository.java:191) ~[spring-batch-core-3.0.2.RELEASE.jar:3.0.2.RELEASE]
Mohamed commented
Hi, I have the same problem, and I don't know how to resolve it. I debuged my code but no way to find the cause.
12:21:42,400 DEBUG @ [FaultTolerantChunkProcessor] Attempting to write: [items=[StagingItem [475472]], skips=[]]
12:21:42,400 DEBUG @ [RetryTemplate] Retry: count=0
12:21:42,402 DEBUG @ [ChunkOrientedTasklet] Inputs not busy, ended: false
12:21:42,402 DEBUG @ [TaskletStep] Applying contribution: [StepContribution: read=1, written=1, filtered=0, readSkips=0, writeSkips=0, processSkips=0, exitStatus=EXECUTING]
12:21:42,402 DEBUG @ [JtaTransactionManager] Participating in existing transaction
12:21:42,404 DEBUG @ [TaskletStep] Saving step execution before commit: StepExecution: id=5026, version=14, name=step1, status=STARTED, exitStatus=EXECUTING, readCount=14, filterCount=0, writeCount=14 readSkipCount=0, writeSkipCount=0, processSkipCount=0, commitCount=14, rollbackCount=0, exitDescription=
12:21:42,404 DEBUG @ [JtaTransactionManager] Participating in existing transaction
12:21:42,409 DEBUG @ [JtaTransactionManager] Participating transaction failed - marking existing transaction as rollback-only
12:21:42,409 DEBUG @ [JtaTransactionManager] Setting JTA transaction rollback-only
12:21:42,412 ERROR @ [TaskletStep] JobRepository failure forcing exit with unknown status
org.springframework.dao.OptimisticLockingFailureException: Attempt to update step execution id=5026 with wrong version (14), where current version is 13
Is there any way to discover the origin of this error? Thanks in advance
Adam commented
Same thing as for @Mohamed. No way to find out what is the root cause of the rollback attempt:
12:28:49.420 [SimpleAsyncTaskExecutor-10] INFO p.d.i.i.c.IncrementalImportJobFactory - Starting the incremental documents import.
12:51:01.738 [SimpleAsyncTaskExecutor-5] INFO p.d.i.i.c.IncrementalImportJobFactory - Finished the incremental documents import.
12:51:01.756 [SimpleAsyncTaskExecutor-5] ERROR o.s.b.core.step.tasklet.TaskletStep - JobRepository failure forcing rollback
org.springframework.dao.OptimisticLockingFailureException: Attempt to update step execution id=23580 with wrong version (1), where current version is 2
at org.springframework.batch.core.repository.dao.JdbcStepExecutionDao.updateStepExecution(JdbcStepExecutionDao.java:255) ~[spring-batch-core-3.0.3.RELEASE.jar:3.0.3.RELEASE]
at org.springframework.batch.core.repository.support.SimpleJobRepository.update(SimpleJobRepository.java:191) ~[spring-batch-core-3.0.3.RELEASE.jar:3.0.3.RELEASE]
at sun.reflect.GeneratedMethodAccessor136.invoke(Unknown Source) ~[na:na]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_55]
at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_55]
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317) ~[spring-aop-4.1.4.RELEASE.jar:4.1.4.RELEASE]
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190) ~[spring-aop-4.1.4.RELEASE.jar:4.1.4.RELEASE]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157) ~[spring-aop-4.1.4.RELEASE.jar:4.1.4.RELEASE]
at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:99) ~[spring-tx-4.1.4.RELEASE.jar:4.1.4.RELEASE]
at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:281) ~[spring-tx-4.1.4.RELEASE.jar:4.1.4.RELEASE]
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:96) ~[spring-tx-4.1.4.RELEASE.jar:4.1.4.RELEASE]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) ~[spring-aop-4.1.4.RELEASE.jar:4.1.4.RELEASE]
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:207) ~[spring-aop-4.1.4.RELEASE.jar:4.1.4.RELEASE]
at com.sun.proxy.$Proxy152.update(Unknown Source) ~[na:na]
at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:452) [spring-batch-core-3.0.3.RELEASE.jar:3.0.3.RELEASE]
at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:330) [spring-batch-core-3.0.3.RELEASE.jar:3.0.3.RELEASE]
at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:133) [spring-tx-4.1.4.RELEASE.jar:4.1.4.RELEASE]
at org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext(TaskletStep.java:271) [spring-batch-core-3.0.3.RELEASE.jar:3.0.3.RELEASE]
at org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration(StepContextRepeatCallback.java:77) [spring-batch-core-3.0.3.RELEASE.jar:3.0.3.RELEASE]
at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult(RepeatTemplate.java:368) [spring-batch-infrastructure-3.0.3.RELEASE.jar:3.0.3.RELEASE]
at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:215) [spring-batch-infrastructure-3.0.3.RELEASE.jar:3.0.3.RELEASE]
at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:144) [spring-batch-infrastructure-3.0.3.RELEASE.jar:3.0.3.RELEASE]
at org.springframework.batch.core.step.tasklet.TaskletStep.doExecute(TaskletStep.java:257) [spring-batch-core-3.0.3.RELEASE.jar:3.0.3.RELEASE]
at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:198) [spring-batch-core-3.0.3.RELEASE.jar:3.0.3.RELEASE]
at org.springframework.batch.core.job.SimpleStepHandler.handleStep(SimpleStepHandler.java:148) [spring-batch-core-3.0.3.RELEASE.jar:3.0.3.RELEASE]
at org.springframework.batch.core.job.flow.JobFlowExecutor.executeStep(JobFlowExecutor.java:64) [spring-batch-core-3.0.3.RELEASE.jar:3.0.3.RELEASE]
at org.springframework.batch.core.job.flow.support.state.StepState.handle(StepState.java:67) [spring-batch-core-3.0.3.RELEASE.jar:3.0.3.RELEASE]
at org.springframework.batch.core.job.flow.support.SimpleFlow.resume(SimpleFlow.java:165) [spring-batch-core-3.0.3.RELEASE.jar:3.0.3.RELEASE]
at org.springframework.batch.core.job.flow.support.SimpleFlow.start(SimpleFlow.java:144) [spring-batch-core-3.0.3.RELEASE.jar:3.0.3.RELEASE]
at org.springframework.batch.core.job.flow.support.state.SplitState$1.call(SplitState.java:93) [spring-batch-core-3.0.3.RELEASE.jar:3.0.3.RELEASE]
at org.springframework.batch.core.job.flow.support.state.SplitState$1.call(SplitState.java:90) [spring-batch-core-3.0.3.RELEASE.jar:3.0.3.RELEASE]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_55]
at org.springframework.core.task.SimpleAsyncTaskExecutor$ConcurrencyThrottlingRunnable.run(SimpleAsyncTaskExecutor.java:251) [spring-core-4.1.4.RELEASE.jar:4.1.4.RELEASE]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_55]
Gérald Quintana commented
@Spyro @ medhedi_abd The problem occurs, not when the chunk read-process-write fails, but when the commit fails. In the past I've seen 2 reasons for that:
- Using Hibernate or JPA: when the read-process-write ends, data has not been flushed yet, and the commit triggers the flush and may raise a an exception (SQL constraint not fulfilled...). The solution is to force a flush (call .flush() on entity manager) and the end of the writer.
- Using Oracle deferred constraints: when the-process-writes ends, SQL constraints have not been checked by Oracle yet, and the commit triggers these SQL constraints verification and may raise an exception (same reason as above). The solution is to replace deferred constraint by immediate constraint. In the end, the commit shouldn't fail, if something goes wrong, it should fail before
Mohamed commented
Hi,
@Gerald, I agree with you,
Indead, this error occurs also when using Jboss4 EAP. and the treatment of batch works well without any error when using "WebSphere 7".
Is there any modification into jboss to fix this problem?, or transformation in spring-batch adapting of jboss transaction?
Thanks in advance
Gérald Quintana commented
@medhedi_Abd In your case it may because of a third reason: "using JTA". If the XA transactions fails to commit a resource for any resource (timeout or whatever).
Having logs of commit failure would help, but it's not the case.
Marcus Mattern commented
We did a simple workaround for us to avoid this problem. I only patched the afterCompletion method of inner class org.springframework.batch.core.step.tasklet.TaskletStep.ChunkTransactionCallback.
{{ @Override
public void afterCompletion(int status) {
try {
if (status != TransactionSynchronization.STATUS_COMMITTED) {
if (stepExecutionUpdated) {
// Wah! the commit failed. We need to rescue the step
// execution data.
logger.info("Commit failed while step execution data was already updated. "
-
"Reverting to old version."); //FIX Start // WORKAROUND für https://jira.springsource.org/browse/BATCH-1767 // OptimisticLockingFailureException updating step execution after commit failure // Description: // - save version number of step execution that should be updated // - copy old step details like before // - set new version number instead of old copied number final Integer savedVersion = stepExecution.getVersion(); //FIX End copy(oldVersion, stepExecution); //FIX Start stepExecution.setVersion(savedVersion); //FIX End if (status == TransactionSynchronization.STATUS_ROLLED_BACK) { rollback(stepExecution); } } chunkListener.afterChunkError(chunkContext); }
if (status == TransactionSynchronization.STATUS_UNKNOWN) { logger.error("Rolling back with transaction in unknown state"); rollback(stepExecution); stepExecution.upgradeStatus(BatchStatus.UNKNOWN); stepExecution.setTerminateOnly(); } } finally { // Only release the lock if we acquired it, and release as late // as possible if (locked) { semaphore.release(); } locked = false; } }
}}
Since then our problem was gone.
Marcus
James Home commented
@Michael
Regarding the comment: "With regards to why the copy method restores the version number...why wouldn't it? The intent is that there was a rollback so the state should match what was there before the rollback."
The thing is, it's not actually a rollback in the database sense. You're really just resetting the counts the their previous values. It's a new update in a db sense, so surely it should use the appropriate version number. The policy of 'rolling back' the version number in a db update seems to be incompatible with the principle of version based optimistic locking that's being used by the dao.
zyro commented
following the suggestion to flush the session in the writer, i think there is still a (slight..) chance for an exception being thrown (e.g. StaleObjectStateException) during the actual commit (which may flush again).
when using a skip policy, shouldnt that exception result in a skip and not break the whole job execution leaving it in an UNKNOWN state?