node-csv-parse
node-csv-parse copied to clipboard
Autodetect delimiter in the csv/tsv files
Is your feature request related to a problem? Please describe. If the data source is sending out multiple delimiter type files it should be possible to detect the delimiter automatically.
Describe the solution you'd like Simple string comparison in the first few lines can give the column count equivalent character & finding the suitable delimiter
Describe alternatives you've considered N/A Additional context N/A
So the idea could be that if the existing delimiter
option or a new auto_delimiter
or a combination of both options (like in the example below) equals an array of character delimiters or true
(converted to the most common delimiters), auto-detection is activated and the first character matching the set will define the delimiter for the rest of the data set, right ?
delimiter
set to true activate auto detection:
parse("a,b|c\n1,2|3", delimiter: true, function(err, data){
data.should.eql([
["a", "b|c"],
["1", "2|3"],
])
})
auto_delimiter
provide a list of potentially accepted delimiters
parse("a,b|c\n1,2|3", delimiter: true, auto_delimiter: ["|", ","], function(err, data){
data.should.eql([
["a,b", "c"],
["1,2", "3"],
])
})
Any comments ?
What if the delimiter isn't commonly used and is just a random character like ^ ? Can we somehow detect any delimiter like Google Sheets or Excel?
I am personally quite uncomfortable with this issue because it implies to store in memory the first few lines and going backward once we decide on a delimiter. It feels more appropriate to write a dedicated stream transform plugged just before csv-parse to determine what is the delimiter.
You are right. That makes more sense.