PapaParse
PapaParse copied to clipboard
Wrong error message thrown on parsing a large file (> 500mb)
Description of issue
On trying to parse a large file of size > 500mb (without streaming) a misleading error message is thrown as shown below -

Steps to reproduce the issue in the demo server (https://www.papaparse.com/demo)
-
Choose a large file (probably > 500mb)

-
Click on Parse (make sure streaming is off) and check the console, the error should be visible

Expected Behaviour
Proper error message needs to be shown. The error doesn't happen when streaming option is set so probably it has something to do with the file size but currently the error message is misleading as it describes that there was some issue in auto-detecting the delimeter.
I guess the problem is related to some memory limit which is not hit on streaming.
Not sure if we can do more in this case.
Yeah the issue seems to be related to memory but are you aware of any specific reason behind why the delimiter error is being thrown and not any other error?
First the delimiter is guessed: https://github.com/mholt/PapaParse/blob/master/papaparse.js#L1074 (before any "real" parsing). The delimiter-guess uses the same parsing method as the "real" parsing. So I guess the delimiter-guess-parsing throws... but I'm surprised you get any output (I don't see many try catches in the code).
Is there any update on this issue? Have same problem using large files (aprox. 1GB).
As a workaround, check the file size before passing the file to papaparse? https://developer.mozilla.org/en-US/docs/Web/API/File see File.prototype.size If you need the large file support use streaming.
In my use case, I have large csv files that need to be processed and before processing them I need to read column names. PapaParse works like a charm for files less than 500MB, but for larger files, this error is shown. As a workaround, I aborted further parsing in the step callback function, which made first couple of lines to be shown as result in the complete callback.
step: (results: ParseResult, parser: PapaParseParser) => parser.abort(),