node-csvtojson
node-csvtojson copied to clipboard
Mixed eol characters in the same csv file is not handled
This may be a non-standard csv format, but if a csv file has a carriage-return as the eol
character in the first row, but then uses say new-line characters for the remaining rows, then the lib will parse the remaining lines as 1 giant row. Meaning the output will be an array of 1 object with a massive number of keys (example file below yields field7507057
as the last key in the object).
Example of this kind of file is a data file from the US Department of Education: https://nces.ed.gov/surveys/pss/zip/pss1920_pu_csv.zip
This may be outside the scope of this lib to handle, but I wanted to bring it to your attention.
![Screen Shot 2022-02-18 at 4 27 18 PM](https://user-images.githubusercontent.com/5447705/154763598-b2c866e5-a9c2-42ee-af07-53be5efe24de.png)
Repro steps:
Download and unzip the example file
$ csvtojson pss1920_pu_csv > pss.json
See the results:
$ tail -c 100 test.json
1372549","field7507055":"0","field7507056":"2.94117647058824","field7507057":"5.48387096774194"}
]
IMHO this lib does not have to support all types of incorrectly formated CSV files, there are users already complaining about the size of the lib.
You have to preprocess the file before feeding it into this module. In Node is quite easy and you can use a stream reader with this lib.
I don't necessarily disagree, as mentioned, this may be out of scope for this lib, this may be a code fix or simple documentation describing how the new line character is auto-detected/used or no action at all, just wanted to bring it up with the maintainers here in the event that this case was not considered.
Maybe I was not clear enough, this module allows fromStream
method which you may use to preprocess the file.
Non-tested code, something like this
const fs = require('fs')
const { Transform } = require("stream")
const csv = require('csvtojson')
const trans = new Transform({
transform(chunk, encoding, callback) {
// process chunk, for example chunk.toString().toUpperCase()
const processedChunk = chunk.toString().toUpperCase()
callback(null, processedChunk)
},
});
csv()
.fromStream(fs.createReadStream('/path/to/file', { encoding: 'utf-8' }).pipe(trans))
.subscribe((json) => {
console.log(json)
},
(err) => {
throw err
},
() => {
console.log('success')
})
Thanks, that's exactly what I used to pre-process the file actually.