Sep icon indicating copy to clipboard operation
Sep copied to clipboard

Sep does not gracefully handle improperly-escaped double quotes

Open bryanboettcher opened this issue 7 months ago • 3 comments

@nietras (sorry, I didn't have a more graceful way to tag you)

First off, kudos to Sep for the absolute blistering speed you've accomplished with it. I understand that in many cases, this speed was gained by intentional (or omitted) checks around the correctness of a CSV. At the same time, there are a few existing cases where Sep's correctness can be tuned (such as DisableColumnCountCheck).

A vendor has a process that occasionally generates a malformed CSV. We will get a row that's otherwise properly escaped, but they are leaving an improper backslashed-doublequote in the string field, like so: ,"XXX\" YYYY",. Whenever Sep hits this data, we soon after get the 16MB buffer overrun exception.

Our vendor is aware, but no ETA for fixing their process. Is there a mechanism now or in the future for Sep to either blindly remove this character, or more gracefully handle it? Using DisableQuotesParsing is insufficient in this case, because many of our fields are (properly) quoted and escaped.

I have a repro: https://github.com/bryanboettcher/SepCrash

bryanboettcher avatar Jul 24 '24 19:07 bryanboettcher