spring-batch icon indicating copy to clipboard operation
spring-batch copied to clipboard

Repetitive execution of FaultTolerantChunkProcessor's method process

Open alappmeng opened this issue 2 years ago • 6 comments

Bug description When FaultTolerantChunkProcessor's method process throw a exeception(not the retriable exception), the method process will execute once more.

Environment The version of the spring-batch-core that I use is 5.0.3.

Steps to reproduce Steps to reproduce the issue.

Expected behavior When method process throws a exception that dont need to retry, do not execute process.

alappmeng avatar Nov 12 '23 01:11 alappmeng

How is the fault-tolerant step configured? Have you added that type of exception to the non-retrybale ones with FaultTolerantStepBuilder#noRetry? It could be that chunk scanning is triggered while it shouldn't.

Please elaborate with a code example to help us reproduce your issue and help you efficiently.

fmbenhassine avatar Nov 17 '23 07:11 fmbenhassine

How is the fault-tolerant step configured? Have you added that type of exception to the non-retrybale ones with FaultTolerantStepBuilder#noRetry? It could be that chunk scanning is triggered while it shouldn't.

Please elaborate with a code example to help us reproduce your issue and help you efficiently.

Dear fmbenhassine,

I meet same question with spring batch 4.3.2. The step configuration like bellow:

@Bean
    public Step expiryPolicyStep() {
        return stepBuilderFactory.get("expiryPolicyStep")
                .<PolicyContent, PolicyContent>chunk(1)
                .reader(expiryPolicyReader(null, null,null))
                .writer(expiryPolicyWriter(null, null))
                .faultTolerant()
                .skip(Exception.class)
                .skipLimit(4000)
                .noRetry(Exception.class)
                .taskExecutor(threadPoolTaskExecutor)
                .build();
    }

When an exception was throwed in writer, the FaultTolerantChunkProcessor would recall the writer again by scan method RecoveryCallback interface. I config the step with readerIsTransactionalQueue to make buffering flag to false. It can avoid repetitive execution. The code like bellow:

return stepBuilderFactory.get("expiryPolicyStep")
                .<PolicyContent, PolicyContent>chunk(1)
                .reader(expiryPolicyReader(null, null,null))
                .writer(expiryPolicyWriter(null, null))
                .readerIsTransactionalQueue()
                .faultTolerant()
                .skip(Exception.class)
                .skipLimit(4000)
                .noRetry(Exception.class)
                .taskExecutor(threadPoolTaskExecutor)
                .build();

Looking forward to fixing this issue as soon as possible

chengwei710 avatar Feb 27 '24 09:02 chengwei710

@fmbenhassine

chengwei710 avatar Feb 28 '24 02:02 chengwei710

Chunk scanning is activated when a skippable exception is thrown from the chunk processor. In that case, the chunk is resized to 1 and items are reprocessed (process + write) one by one, each one in its own transaction. So you might think items are being retried but it is not the case, this is part of the scanning process. @alappmeng I think this is your case (so it is normal to see the process method being re-executed). I suggest you to check the chunk scanning samples and try out various retry and skip policies.

@chengwei710 A skip is a recovery option for an exhausted retry. So setting .skip(Exception.class) and .noRetry(Exception.class) is actually incompatible. I believe this is similar to https://stackoverflow.com/questions/74482257 and https://stackoverflow.com/questions/61756484, which you might find useful.

fmbenhassine avatar Mar 04 '24 14:03 fmbenhassine

Dear @fmbenhassine,

I don't quite understand this point. When chunk size of the job is 1, and got an exact exception in chunk, is it necessary to rescan items to retry. I just want my job to skip the exception without retried. If I config the job without noRetry, it would retry when an exception raised. So how can I config the job?

chengwei710 avatar Mar 18 '24 10:03 chengwei710

My writer makes requests to an external server over the network, rendering transaction rollback completely ineffective.

new StepBuilder("step", jobRepository).<Integer, Integer>chunk(3, this.transactionManager)
     .reader(itemReader())
     .processor(itemProcessor())
     .writer(itemWriter())
     .faultTolerant()
     .skip(Exception.class)
     .build();

Recovery is trigger. While there are no issues when using transactions, my writer primarily performs external communication requests. If an exception occurs in the chunk, the external requests made before the exception will be retried during recovery, resulting in a minimum of two requests. Since the interface ItemWriter is designed to always receive a chunk, it becomes impossible to determine which item caused the exception and which items were processed correctly. This fundamentally complicates the resolution of the issue. Is it possible to resolve this by moving away from an ItemWriter that accepts chunks? Or should we consider this approach inappropriate for batch processing, which is inherently designed for large-scale operations, and explore alternative solutions? Feel free to make any additional adjustments!

JopopScript avatar Dec 30 '24 16:12 JopopScript