xsv icon indicating copy to clipboard operation
xsv copied to clipboard

Is there an option to write without CSV escapes?

Open jondegenhardt opened this issue 6 years ago • 8 comments

A question: Is there an option to perform output without the CSV escape syntax? This would be to generate a more strict TSV format, without escapes.

I don't see this, and the documentation is pretty good. I'd just like to make sure I haven't missed something. There are a number of options to the fmt command that provide control over the escaping used, but I didn't see one turning it off.

Some examples:

$ # fmt -t will change the delimiter and drop surrounding quotes (without -quote-always)
$ echo '"abc","def"' | xsv fmt -t $'\t'
abc	def

$ # Escapes are generated if a field contains a quote
$ echo '"abc","d""ef"' | xsv fmt -t $'\t'
abc	"d""ef"

$ # In tsv the result would be:
$ #    abc	d"ef

$# Similarly with embedded field and record separators (tab/newline).
$# In TSV they are disallowed, and might be replaced by a space when encountered.
$ echo $'"abc","d\tef"' | xsv fmt -t $'\t'
abc	"d	ef"

$ # In the above, the embedded tab character was retained.

Again, I'm only asking if there is an option I haven't found. In the examples above the fmt command is doing exactly what it says, which is to change the CSV delimiter character.

jondegenhardt avatar Nov 09 '17 19:11 jondegenhardt

@jondegenhardt Thanks for the detailed question! I do not believe there is any such option. In fact, the underlying CSV writer doesn't support it, so that's how I know there isn't any such option. The CSV writer options are here: https://docs.rs/csv/1.0.0-beta.5/csv/struct.WriterBuilder.html --- we might consider changing escape to accept an Option<u8>, and when it and double_quote are disabled, then no escaping is performed. We would also need to add a --quote-never option I suppose.

The last bit is silently changing \t and \n into something else, which gets more complicated.

My estimation is that this is a bit of an awkward fit for xsv at the moment.

BurntSushi avatar Nov 09 '17 21:11 BurntSushi

Very good, thanks for the detailed response. The CSV doc reference is helpful.

jondegenhardt avatar Nov 09 '17 22:11 jondegenhardt

I was about to open a new issue about this, cf. comments from https://github.com/BurntSushi/xsv/issues/67#issuecomment-480218068 and down, but I see this has been closed already. @jondegenhardt if you still have a need for --quote-never, xsv 0.13.0 seems to do this if you pass in the ASCII character 1, though as it's not documented anywhere I guess it comes with no guarantees :-)

$ printf 'user\tutterance\njoe\tSay "hi"\n'|xsv select -d $'\t' utterance \
    | xsv fmt -t $'\t'
utterance
"Say ""hi"""

$ printf 'user\tutterance\njoe\tSay "hi"\n'|xsv select -d $'\t' utterance \
    | xsv fmt --quote $'\1' -t $'\t'
utterance
Say "hi"

printf 'user\tutterance\njoe\tSay "hi"\n'|xsv select -d $'\t' utterance \
   | xsv fmt --quote $'\1' -t $'\t' \
   | grep -c $'\1'
0

unhammer avatar Apr 05 '19 11:04 unhammer

@unhammer Just to clarify, using the ASCII byte 1 only works because it presumably does not appear in your input anywhere. If it did, then it would need to be quoted. Moreover, if your input contained a field that spanned multiple lines, then it would also need to be quoted.

I'm not sure why this was closed. The underlying CSV writer does support it, so I think this is as easy as adding a new --quote-never flag and hooking it up.

BurntSushi avatar Apr 05 '19 11:04 BurntSushi

Aha, thanks for the clarification, good to know the exact dangers involved.

unhammer avatar Apr 05 '19 12:04 unhammer

Reason I closed it was that my question had been answered. Didn't mean to suggest the feature would not be useful.

jondegenhardt avatar Apr 05 '19 16:04 jondegenhardt

Hi, I think --quote-never would help in the aforementioned cases. Do you have a plan to implement this?

Thanks!

(and congrats for the tool, it's great)

bosr avatar Oct 14 '19 18:10 bosr

A question: Is there an option to perform output without the CSV escape syntax? This would be to generate a more strict TSV format, without escapes.

What’s the behavior when a tab or newline is encountered in the data? Are they just converted to spaces (the example says “and might be replaced by a space when encountered”)? Or should the program error out?

LemmingAvalanche avatar Jun 27 '23 12:06 LemmingAvalanche