node-csv-parse icon indicating copy to clipboard operation
node-csv-parse copied to clipboard

Autodetect delimiter in the csv/tsv files

Open akash-rajput opened this issue 5 years ago • 4 comments

Is your feature request related to a problem? Please describe. If the data source is sending out multiple delimiter type files it should be possible to detect the delimiter automatically.

Describe the solution you'd like Simple string comparison in the first few lines can give the column count equivalent character & finding the suitable delimiter

Describe alternatives you've considered N/A Additional context N/A

akash-rajput avatar Nov 15 '19 06:11 akash-rajput

So the idea could be that if the existing delimiter option or a new auto_delimiter or a combination of both options (like in the example below) equals an array of character delimiters or true (converted to the most common delimiters), auto-detection is activated and the first character matching the set will define the delimiter for the rest of the data set, right ?

delimiter set to true activate auto detection:

parse("a,b|c\n1,2|3", delimiter: true, function(err, data){
  data.should.eql([
    ["a", "b|c"],
    ["1", "2|3"],
  ])
})

auto_delimiter provide a list of potentially accepted delimiters

parse("a,b|c\n1,2|3", delimiter: true, auto_delimiter: ["|", ","], function(err, data){
  data.should.eql([
    ["a,b", "c"],
    ["1,2", "3"],
  ])
})

Any comments ?

wdavidw avatar Nov 15 '19 08:11 wdavidw

What if the delimiter isn't commonly used and is just a random character like ^ ? Can we somehow detect any delimiter like Google Sheets or Excel?

ajaz-ur-rehman avatar Oct 18 '20 15:10 ajaz-ur-rehman

I am personally quite uncomfortable with this issue because it implies to store in memory the first few lines and going backward once we decide on a delimiter. It feels more appropriate to write a dedicated stream transform plugged just before csv-parse to determine what is the delimiter.

wdavidw avatar Oct 19 '20 06:10 wdavidw

You are right. That makes more sense.

ajaz-ur-rehman avatar Oct 19 '20 07:10 ajaz-ur-rehman