spring-batch icon indicating copy to clipboard operation
spring-batch copied to clipboard

Allow dependent steps to start concurrently [BATCH-1538]

Open spring-projects-issues opened this issue 15 years ago • 3 comments
trafficstars

Dave Syer opened BATCH-1538 and commented

Allow dependent steps to start concurrently: step1 can be producing data that are needed in step2, e.g. staging records, and step2 can start to process those as soon as they are available without waiting. All that is needed is a protocol for the steps to agree on whether a dependency is finished or in flight. (In the staging case step1 is hardly ever a limiting factor in terms of execution time, but it might still help.)


1 votes, 3 watchers

spring-projects-issues avatar Mar 22 '10 04:03 spring-projects-issues

Giovanni Dall'Oglio Risso commented

Excuse me, but this issue seems very similar to BATCH-1517.

IMHO you can think at something like "spring integration channels", between two steps

  • First step
    • when a chunk commits, you push the data in a queue *** you can insert the List writed by the ItemWriter, or to perform a sort of trasformation, defining a delegate
  • Second step
    • You use the data coming from the queue to substitute the IO-consuming ItemReader
    • you use the real ItemReader only in case of restart

There are some things to clarify (eg: how is managed the restart? How manage different chunk sizes?), but this way can be feasible.

At the moment, to save IO time, we overloaded one step:

  • ItemReader
    • a simple reader
  • ItemProcessor
    • a CompositeItemProcessor, with a long-list of processors, that do the operations of multiple steps, all together
  • ItemWriter
    • a CompositeItemWriter, that write everything (really a lot of things) *** followed by a list of FilterItemWriter, *** that delegates the real IO to other Writers

Obviously: this solution help to save IO time, but makes the things harder to understand and change. And this is a bad thing. More: this solution mantains the operations single-thread-sequential.

Your solution (a pipeline) is ways better: enable the developers to save IO time, design a cleaner jobs, and allow the processes to be broken into different threads (one for chunk).

spring-projects-issues avatar Jun 26 '12 04:06 spring-projects-issues

Dave Syer commented

Yes, I agree this is basically a duplicate.

spring-projects-issues avatar Jun 26 '12 07:06 spring-projects-issues

Mahmoud Ben Hassine commented

I implemented a POC here for concurrent steps using a blocking queue as a staging area. It is typically an implementation of the producer/consumer pattern. However, I don't see (yet) how this could be provided as a built-in feature in the framework.

If the POC makes sense, I would add it as a sample to the samples module rather than implement it as a feature in the framework (other than probably adding the BlockingQueueItemReader and BlockingQueueItemWriter to the library of readers/writers).

Any thoughts?

spring-projects-issues avatar Nov 12 '18 22:11 spring-projects-issues