kantan.csv icon indicating copy to clipboard operation
kantan.csv copied to clipboard

Consuming a pipe-delimited file becomes erratic when is set to CsvConfiguration('|', '"', QuotePolicy.Always, Header.None)

Open jim-oflaherty-jr-qalocate-com opened this issue 4 years ago • 0 comments

Summary:

When consuming a pipe-delimited file, the parser becomes erratic with CsvConfiguration('|', '"', QuotePolicy.Always, Header.None). The first time a pipe character is immediately followed by a double-quote character [ex: "...|"..."], the parser apparently starts ignoring pipe characters until hit the second instance of pipe+double-quote, or hits the end of the file.

The documentation for CsvConfiguration, both in the general descriptive documentation as well as in the code itself, is extremely sparse such that I couldn't figure out what to do from it forcing me to do lots of experimentation.

Details:

Undesired Effect: I am parsing a pipe-delimited file from an FCC's ASR download. It kept blowing up my ingestion process by attempting to insert a column value into my target RDBMS which was measured in +8K characters (the maximum width for that particular column).

Minimizing: I have used the crap out of this parser on many projects and never hit this issue. After spending close to two hours assuming it was an error in my own code, I finally isolated it to the kantan .csv parser. I then focused on minimizing the problem so I could submit an issue. It is now down to changing the parameters to CsvConfiguration. By changing the second parameter to also be a pipe character, the problem went away.

Fix: CsvConfiguration('|', '|', QuotePolicy.Always, Header.None)

WARNING: This CvsConfiguration does not handle an escaped pipe character (backslash+pipe).