Preview of first row fails with "UndetectableDelimiter" error if file is too large

Open briavicenti opened this issue 2 years ago • 1 comments

I'm using the preview option when parsing to validate that the header row of a file matches a set of expected headers before attempting to upload:

    Papa.parse(file, {
      preview: 1,
      complete: (results) => {
         const headers = results.data[0];

         const doHeadersMatch = headers?.every(
          (header, idx) => header === EXPECTED_HEADERS[idx]
        );
      },
    });

I have 3 files that I am using to test this feature that are 100MB, 300MB, and 700MB in size. The 3 files have identical headers -- I created the 100MB and 300MB versions by deleting rows from the original 700MB test file. Despite that, though, I get an empty data field and the following error when trying to parse the 700MB version but not the other two:

{
    "type": "Delimiter",
    "code": "UndetectableDelimiter",
    "message": "Unable to auto-detect delimiting character; defaulted to ','"
}

Specifying the delimiter and newline options when parsing does not get rid of this error. The error appears after a few seconds' delay, which surprised me anyway as I'm just trying to preview a single line. Would really appreciate any help or suggested workarounds here!

Jul 24 '23 19:07 briavicenti

If you use the "streaming" setup it lets you handle much larger files.

Here is how it's normally used:

let results: string[][] = [];
Papa.parse(file, {
  preview: 100000,
  worker: true,
  step: (result) => {
    // Called on every chunk read from the file.
    if (result.data) {
      results.push(result.data);
    }
  },
  complete: (test) => {
    // Final results
    console.log(results);
  },
});

But since you're only reading one row you can simplify to this:

Papa.parse(file, {
  preview: 1,
  worker: true,
  step: (result) => {
    // First and only chunk
    console.log(result.data);
  },
});

Oct 10 '23 15:10 dasveloper