csv-conduit
csv-conduit copied to clipboard
Less aggressive quoting
Currently csv-conduit outputs the string ""
for empty fields. Postgres throws the following error when it encounters this for fields of type double precision
:
ERROR: invalid input syntax for type double precision: ""
So while the current behavior is correct according to the spec, it seems to be less broadly supported in practice. Also, if you're using csv-conduit to transform large files, the current behavior means that every single field will be quoted. This means that you're outputting two additional bytes per field, making the resulting files noticeably larger than they need to be.
This PR only quotes fields if they contain the quote character, which is correct behavior according to the spec.
I don't know if I can justify changing the behavior for all users here. I think I'd prefer adding a flag to CSVSettings
, something like data OutputQuoting = AlwaysQuote | QuoteWhenNeeded
and have the default continue to be AlwaysQuote
.
Ahh yes, good idea. I'll try to get to it when I have some free time.