node-csvtojson icon indicating copy to clipboard operation
node-csvtojson copied to clipboard

Mixed eol characters in the same csv file is not handled

Open zxlin opened this issue 2 years ago • 4 comments

This may be a non-standard csv format, but if a csv file has a carriage-return as the eol character in the first row, but then uses say new-line characters for the remaining rows, then the lib will parse the remaining lines as 1 giant row. Meaning the output will be an array of 1 object with a massive number of keys (example file below yields field7507057 as the last key in the object).

Example of this kind of file is a data file from the US Department of Education: https://nces.ed.gov/surveys/pss/zip/pss1920_pu_csv.zip

This may be outside the scope of this lib to handle, but I wanted to bring it to your attention.

Screen Shot 2022-02-18 at 4 27 18 PM

Repro steps:

Download and unzip the example file

$ csvtojson pss1920_pu_csv > pss.json

See the results:

$ tail -c 100 test.json
1372549","field7507055":"0","field7507056":"2.94117647058824","field7507057":"5.48387096774194"}

]

zxlin avatar Feb 18 '22 21:02 zxlin

IMHO this lib does not have to support all types of incorrectly formated CSV files, there are users already complaining about the size of the lib.

You have to preprocess the file before feeding it into this module. In Node is quite easy and you can use a stream reader with this lib.

jfoclpf avatar Mar 21 '22 08:03 jfoclpf

I don't necessarily disagree, as mentioned, this may be out of scope for this lib, this may be a code fix or simple documentation describing how the new line character is auto-detected/used or no action at all, just wanted to bring it up with the maintainers here in the event that this case was not considered.

zxlin avatar Mar 21 '22 11:03 zxlin

Maybe I was not clear enough, this module allows fromStream method which you may use to preprocess the file.

Non-tested code, something like this


const fs = require('fs')
const { Transform } = require("stream")
const csv = require('csvtojson')

const trans = new Transform({
  transform(chunk, encoding, callback) {
    // process chunk, for example chunk.toString().toUpperCase()
    const processedChunk = chunk.toString().toUpperCase()
    callback(null, processedChunk)
  },
});

csv()
  .fromStream(fs.createReadStream('/path/to/file', { encoding: 'utf-8' }).pipe(trans))
  .subscribe((json) => {
    console.log(json)
  },
  (err) => {
    throw err
  },
  () => {
    console.log('success')
  })

jfoclpf avatar Mar 21 '22 11:03 jfoclpf

Thanks, that's exactly what I used to pre-process the file actually.

zxlin avatar Mar 21 '22 11:03 zxlin