node-csv
node-csv copied to clipboard
errror when comment character contained within CSV data
Summary
Hi 👋 thanks for the great lib!
We are using the option { "comment": "#" } to remove a header section from the CSV file which contains multiple lines beginning with '#' (as per bash syntax).
Motivation
The issue we face is that the hash (#) character may also exist as a valid character within the body of some rows, this results in a fatal columns mismatch error.
For example:
# comment
# comment
col1,col2,col3
a,b,c
a,###,c
Alternative
My understanding of the documentation "Treat all the characters after this one as a comment" is that currently both infix and prefix matching are supported, which makes sense for lines like this a,b,c # this is a comment.
In my case at least I was caught out by this, as I assumed that the match was prefix only, I guess I was expecting it to only apply to lines which begin with the comment string (as per bash).
Draft
What I'd love to have is the ability to control whether this was applied as an infix match or only as a prefix.
For example, if I were able to supply a regular expression I could use ^# to 'anchor' the string at the beginning of the row.
Additional context
We're using the stream API, I wasn't able to find the exact places in the code where this is implemented, but presumably this is handled in a streaming fashion and so therefore may or may not have access to the newline, depending on where in the parser it is implemented.
If you'd like to point me to the places in the code which are relevant I might be able to draft a PR, although we'd need to discuss how best to change the JS API to allow users to configure whether infix matching was enabled or not.
Hi @missinglink, supporting regular expression is impossible. It would apply to the all record but to know what is a record, we need to parse the record because a record separator could be escaped or present inside a quoted field. However, with a comment, attempting to parse the record will legitimately end up as an error.
Not a big fan of introducing a new option but I don't have much other option to propose.