csv-stream icon indicating copy to clipboard operation
csv-stream copied to clipboard

Does not handle escaped strings

Open cboscolo opened this issue 8 years ago • 5 comments

I am using csv-streamify in my log parsing project: https://github.com/cboscolo/elb2loggly. Parsing lines with escaped strings does not work properly. For example, using var csvToJson = csv({objectMode: true, delimiter: ' '}); to parse this line:

2016-06-01T14:09:00.418027Z anaconda-org "GET https://pypi.anaconda.org:443/username/simple/virtualenv/ HTTP/1.1" "pip/8.1.2 {\"openssl_version\":\"OpenSSL 1.0.2d 9 Jul 2015\"}" TLSv1.2

Results in this: [ "2016-06-01T14:09:00.418027Z", "anaconda-org", "GET https://pypi.anaconda.org:443/username/simple/virtualenv/ HTTP/1.1", "pip/8.1.2 {\\openssl_version\\:\\OpenSSL", "1.0.2d", "9", "Jul", "2015\\}", "TLSv1.2" ]

Instead of this: [ "2016-06-01T14:09:00.418027Z", "anaconda-org", "GET https://pypi.anaconda.org:443/username/simple/virtualenv/ HTTP/1.1", "pip/8.1.2 {"openssl_version":"OpenSSL 1.0.2d 9 Jul 2015"}", "TLSv1.2" ]

Have you considered adding support for escaped strings? If not, do you know of any other csv parsing npm modules that do?

cboscolo avatar Jul 03 '16 16:07 cboscolo

Adding my support for this item - we are downstream from elb2loggly and this is a blocker for getting reliable logging from our load-balancers due to this issue. Tagging the downstream issue: cboscolo/elb2loggly#22

stephenakearns avatar Jul 25 '16 15:07 stephenakearns

Sorry for not responding. I'm looking into into it right now!

klaemo avatar Jul 25 '16 18:07 klaemo

so, as far as i understand it, we have the following problem:

'"' === '\"' // true

The escaped and the unescaped quotation marks are the same in JS, because '' is the escape character of javascript strings (or I just haven't found a way to make sense of this yet).

You can't choose another escape char, I assume? For example in csv "" is often use to indicate an escaped ". This is also supported by this parser.

2016-06-01T14:09:00.418027Z anaconda-org "GET https://pypi.anaconda.org:443/username/simple/virtualenv/ HTTP/1.1" "pip/8.1.2 {""openssl_version"":""OpenSSL 1.0.2d 9 Jul 2015""}" TLSv1.2

[
    '2016-06-01T14:09:00.418027Z',
    'anaconda-org',
    'GET https://pypi.anaconda.org:443/username/simple/virtualenv/ HTTP/1.1',
    'pip/8.1.2 {"openssl_version":"OpenSSL 1.0.2d 9 Jul 2015"}',
    'TLSv1.2' 
]

klaemo avatar Jul 25 '16 18:07 klaemo

Unfortunately, the CSV files are being generated by AWS, so I have no control over how they escape the double quote. I tried to two double quotes "", but that did not work either. I think using \" to escape the " inside a string that is delimited with " is the proper thing.

cboscolo avatar Jul 26 '16 05:07 cboscolo

I'm not saying \ isn't a thing, it's just that JavaScript interprets it as an escape character. Create two strings in your browser console or node REPL, one with backslash escaping and one without. You'll see they end up being identical. The backslash doesn't appear in the string anymore. That being the case, I don't think I can add support for this.

But: I could also be totally wrong and missing the point.

klaemo avatar Jul 26 '16 07:07 klaemo