PapaParse icon indicating copy to clipboard operation
PapaParse copied to clipboard

beforeFirstChunk inconsistent chunk size

Open JCatucSignol opened this issue 3 years ago • 0 comments

Hello, i'm processing a range of csv files with varying amount of columns and using 'beforeFirstChunk' to sanitize the headers.

This is the configuration:


    Papa.parse(file, {
        header: true,
        preview: endRow,
        step: (result, parser) => {
          // parse and do calculations on values
        },
        beforeFirstChunk: (chunk = '') => {
          this._logger.debug(`Papaparse first chunk length = ${chunk.length}`)
          //sanitize headers
        },
        transformHeader: header => this.replaceHeader(header),
        error: error => {
          // trow 
        },
        complete: () => {
          // finish processing
        }
      })

Now, the problem is on the beforeFirstChunk function, for the same file when it prints the length of the first chunk sometimes the value is Papaparse first chunk length = 379 while most of the times it's Papaparse first chunk length = 16384.

When the chunk length is small the first chunk will not include all the column headers in the file, causing it to fail down the line as it expects less columns than it actually reads.

If anyone knows how to define a value for it that would be appreciated! I tried defining the chunk size but it didn't solve the problem (also, by default it should be 5MB which should be more than enough)

This code is running on a NodeJS AWS Lambda

JCatucSignol avatar Nov 09 '21 12:11 JCatucSignol