csv-conduit icon indicating copy to clipboard operation
csv-conduit copied to clipboard

Less aggressive quoting

Open mightybyte opened this issue 6 years ago • 2 comments

Currently csv-conduit outputs the string "" for empty fields. Postgres throws the following error when it encounters this for fields of type double precision:

ERROR:  invalid input syntax for type double precision: ""

So while the current behavior is correct according to the spec, it seems to be less broadly supported in practice. Also, if you're using csv-conduit to transform large files, the current behavior means that every single field will be quoted. This means that you're outputting two additional bytes per field, making the resulting files noticeably larger than they need to be.

This PR only quotes fields if they contain the quote character, which is correct behavior according to the spec.

mightybyte avatar Feb 25 '18 17:02 mightybyte

I don't know if I can justify changing the behavior for all users here. I think I'd prefer adding a flag to CSVSettings, something like data OutputQuoting = AlwaysQuote | QuoteWhenNeeded and have the default continue to be AlwaysQuote.

MichaelXavier avatar Feb 26 '18 16:02 MichaelXavier

Ahh yes, good idea. I'll try to get to it when I have some free time.

mightybyte avatar Feb 26 '18 20:02 mightybyte