charred icon indicating copy to clipboard operation
charred copied to clipboard

Single quote in a csv field has different behaviour from clojure.data.csv and python

Open maxthoursie opened this issue 3 years ago • 2 comments


  (csv/read-csv "a,3\"\nb,4\"\nc,5")
  ;; => (["a" "3\""] ["b" "4\""] ["c" "5"])

  (charred/read-csv "a,3\"\nb,4\"\nc,5")
  ;; => (["a" "3\nb,4"] ["c" "5"])

I find it hard to say which is "right". I encountered this in a dataset that uses quotes as a unit for inches. In this case both data.csv and pythons csv library does the correct thing while charred collapses one row. So for compatibility I would prefer charred to to the same.

The example input used:

a,3"
b,4"
c,5

Without the last row, I get an exception with charred.

maxthoursie avatar Jun 30 '22 16:06 maxthoursie

The way I think the other mentioned parses work, is that a double-quote that is not at the beginning of a field is not considered to be quoting.

See this python example:

import csv
with open("test.csv", 'r+') as h:
    r = csv.reader(h)
    for i in r:
        print(i)

Input:

a,3"
b,4"
c,"Another
line"

Output:

['a', '3"']
['b', '4"']
['c', 'Another\nline']

maxthoursie avatar Jun 30 '22 17:06 maxthoursie

It looks to me like the quotes are escaped which should be totally fine. Agreed charred does the wrong thing here.

cnuernber avatar Jun 30 '22 17:06 cnuernber

Fixed in release 1.012.

cnuernber avatar Aug 31 '22 20:08 cnuernber

Thanks for the issue, btw. Your analysis was spot on.

It was actually something I found when reading the data.csv source code was uncertain about wanted to check what more csv parsers did before implementing it as it is kind of a narrowing definition.

cnuernber avatar Aug 31 '22 20:08 cnuernber