Handle non-RFC-compliant backslash-escaped quotes in CSV
I found quite serious problem with backslash-escaped quotes. It doesn't work when is this quote followed by same character as field divider.
comma-backslash.csv
id,name,price
1,"ESCAPING QUOTES WITH BACKSLAH \" WORKS",123.44
2,"COMBINATION WITH BACKSLASH-ESCAPED QUOTES\", AND COMMA CHAR AFTER QUOTES DOES NOT WORK",666
mlr --csv check comma-backslash.csv mlr: syntax error: unwrapped double quote at line 2.
Exactly the same result with different type of field divider, e.g. semicolon.
semicolon-backslash.csv
id;name;price
1;"ESCAPING QUOTES WITH BACKSLAH \" WORKS";123.44
2;"COMBINATION WITH BACKSLASH-ESCAPED QUOTES\"; AND SEMICOLON CHAR AFTER QUOTES DOES NOT WORK";666
mlr --csv --ifs semicolon check semicolon-backslash.csv mlr: syntax error: unwrapped double quote at line 2
When using double-quotes, everything works properly.
double-quotes.csv
id;name;price
1;"ESCAPING USING DOUBLE QUOTES "" WORKS";123.44
2;"COMBINATION WITH DOUBLE QUOTES""; AND SEMICOLON CHAR WORKS";666
mlr --csv --ifs semicolon check double-quotes.csv
The issue is that in RFC-compliant CSV, the way to escape double quotes is to repeat them: "" rather than \". This is contrast to spec-compliant JSON which uses \" rather than "".
Examples:
$ echo '{"a":"b""c""d"}' | jq .
parse error: Expected separator between values at line 1, column 11
$ echo '{"a":"b\"c\"d"}' | jq .
{
"a": "b\"c\"d"
}
$ echo '{"a":"b""c""d"}' | mlr --ijson --oxtab cat
mlr: Unable to parse JSON data: Line 1 column 0: Expected , before "
$ echo '{"a":"b\"c\"d"}' | mlr --ijson --oxtab cat
a b"c"d
$ mlr --icsv --oxtab cat <<EOF
a,b,c
1,"2,\"3\",4",5
EOF
mlr: syntax error: unwrapped double quote at line 1.
$ mlr --icsv --oxtab cat <<EOF
a,b,c
1,"2,""3"",4",5
EOF
a 1
b 2,"3",4
c 5
So this turns out to be a request to handle non-RFC-compliant CSV.
Which is not a bad idea, but it isn't a bug; it's a design change.