data-import icon indicating copy to clipboard operation
data-import copied to clipboard

Improve Result interface

Open gnat42 opened this issue 8 years ago • 1 comments

Allow the Offset filter to stop the process function loop. The impetus is the lack of detail the result object contains - particularly when using the Offset filter. When processing a file in chunks the StepAggregator reports the number of successfully imported results and errors but not the number of skipped. This allows a user to seek the reader, and then process a batch and then stop. Receiving an accurate set of processed,skipped and errors.

gnat42 avatar Aug 21 '15 17:08 gnat42

To give some added detail to the impetus for this.

We're building a system that allows users to build maps and import data. The backend of the import process is this library. However the use case includes uploading files with 150 000 rows. Obviously this needs to be processed in chunks. When importing this I was using the OffsetFilter to move into position and then process X rows of the source file. I then move forward based on the totalProcessedCount of the Result object. However really totalProcessedCount was only a count of the imported+errors. It excluded any skipped rows. Once realizing this, I started digging.

This resulted in realizing that the OffsetFilter didn't affect the reader, and would still cause the reader to read through all rows un-necessarily. So I added to the OffsetFilter the ability to stop the processing by throwing a specific exception. I considered adding one to skip the first set but figured that was a bit odd. So in my usecase I call seek() on the reader prior to processing.

The next issue is knowing that I've actually consumed my batch size. I would request that I process 400 records using the OffsetFilter($currentRowPosition,400). And get a totalProcessedCount of lets say 365. This was because 35 rows has some kind of issue and were skipped.

With this change I can know how many rows were processed, skipped or had an exception as well as the number that were imported.

I'm still a bit unsure if the OffsetFilter should be using a SkipException to filter the first set of rows mainly for a consistent 'API' as it were. Though really reading through and 'hydrating' the rows to get to the position in place is a bit weird/in-efficient.

gnat42 avatar Aug 21 '15 19:08 gnat42