PapaParse icon indicating copy to clipboard operation
PapaParse copied to clipboard

One-line large CSV file is not chunked

Open bogdan-bondar opened this issue 1 year ago • 1 comments

The input File consists of one very long line/row in the form of abc1, abc2, abc3, ..., abc10000000. The file size is about 200 mb. Here is the parsing code:

Papa.LocalChunkSize = 1024 * 1024;  // 1 mb
const onUploadCsv = (event: React.ChangeEvent<HTMLInputElement>): void => {
        const file = event.target.files?.[0];
        if (!file) {
            return;
        }

        Papa.parse(file, {
            worker: false,
            header: false,
            beforeFirstChunk: chunk => {
                console.log('Before first chunk callback chunk: ' + chunk);
                return chunk;
            },
            chunk: (results, parser) => {
                console.log(
                    'Chunk callback results data length: ' +
                        results.data.length,
                );
                console.log('Chunk callback results data: ' + results.data);
            },
            complete: () => {
                console.log('Complete!');
            },
        });

        // reset value to allow upload same file again
        event.target.value = '';
    };

The result is that beforeFirstChunk is executed as intended, but chunk callback just returns no data for each iteration and then returns the whole line in the last iteration, that defies the purpose of chunking/streaming. For multi-line/multi-row files everything is working as intended.

Could someone, please, explain the behaviour: is this format not supported or is there a bug in the library?

bogdan-bondar avatar Oct 31 '24 15:10 bogdan-bondar

Interesting, chunking is not designed to work on a single line and I think adding that would require too many breaking API changes. This also seems like a pretty unreasonable thing to support. I think a good tradeoff for this library in service of being performant and accessible is to require that no individual row can exceed available memory.

dboskovic avatar Aug 05 '25 02:08 dboskovic