PapaParse icon indicating copy to clipboard operation
PapaParse copied to clipboard

Wrong error message thrown on parsing a large file (> 500mb)

Open CKanishka opened this issue 4 years ago • 6 comments

Description of issue

On trying to parse a large file of size > 500mb (without streaming) a misleading error message is thrown as shown below - image

Steps to reproduce the issue in the demo server (https://www.papaparse.com/demo)

  • Choose a large file (probably > 500mb) image

  • Click on Parse (make sure streaming is off) and check the console, the error should be visible image

Expected Behaviour

Proper error message needs to be shown. The error doesn't happen when streaming option is set so probably it has something to do with the file size but currently the error message is misleading as it describes that there was some issue in auto-detecting the delimeter.

CKanishka avatar Jun 09 '21 11:06 CKanishka

I guess the problem is related to some memory limit which is not hit on streaming.

Not sure if we can do more in this case.

pokoli avatar Jun 09 '21 11:06 pokoli

Yeah the issue seems to be related to memory but are you aware of any specific reason behind why the delimiter error is being thrown and not any other error?

CKanishka avatar Jun 09 '21 12:06 CKanishka

First the delimiter is guessed: https://github.com/mholt/PapaParse/blob/master/papaparse.js#L1074 (before any "real" parsing). The delimiter-guess uses the same parsing method as the "real" parsing. So I guess the delimiter-guess-parsing throws... but I'm surprised you get any output (I don't see many try catches in the code).

janisdd avatar Jun 12 '21 16:06 janisdd

Is there any update on this issue? Have same problem using large files (aprox. 1GB).

nikolabojovic97 avatar Feb 23 '22 08:02 nikolabojovic97

As a workaround, check the file size before passing the file to papaparse? https://developer.mozilla.org/en-US/docs/Web/API/File see File.prototype.size If you need the large file support use streaming.

janisdd avatar Feb 23 '22 08:02 janisdd

In my use case, I have large csv files that need to be processed and before processing them I need to read column names. PapaParse works like a charm for files less than 500MB, but for larger files, this error is shown. As a workaround, I aborted further parsing in the step callback function, which made first couple of lines to be shown as result in the complete callback.

step: (results: ParseResult, parser: PapaParseParser) => parser.abort(),

nikolabojovic97 avatar Feb 23 '22 09:02 nikolabojovic97