xsv icon indicating copy to clipboard operation
xsv copied to clipboard

Raw value output from fmt

Open amake opened this issue 3 years ago • 4 comments

I have a CSV where one of the fields is actually JSON:

foo,"[""bar"",""baz""]",bazinga

(Before anyone tells me this is dumb: Yes, I agree. But it's what I have to work with, for various reasons.)

I would like to extract just the JSON column for further processing as JSON, but there doesn't seem to be a way to convince xsv fmt to output the "raw" text value; it is always quoted for CSV purposes.

I imagine it would look something like this, e.g. via an -r or --raw flag:

$ echo 'foo,"[""bar"",""baz""]",bazinga' | xsv select 2 | xsv fmt -r
["bar","baz"]

Or, as a hack, fmt --quote could accept an empty string I guess?

(I know that such output would not be suitable for further processing by xsv, but that's kind of the point.)

For prior art on this, see for instance the --raw-output / -r flag in jq.

amake avatar Feb 16 '21 00:02 amake

I suspect this would be a better fit for the xsv select command itself?

Whether it goes in xsv select or xsv fmt, I think more specification is required. For example, what happens if more than one column or row is selected? What delimiter is used for the raw output?

BurntSushi avatar Feb 16 '21 13:02 BurntSushi

I suspect this would be a better fit for the xsv select command itself?

Sure, that would be fine.

Whether it goes in xsv select or xsv fmt, I think more specification is required. For example, what happens if more than one column or row is selected?

Given the file my.csv below:

foo,"[""bar"",""baz""]",bazinga
buzz,"[""bang"",""boom""]",blammo

For selecting a single column with multiple rows, I would expect one value per line, like:

$ cat my.csv | xsv select 2 --raw
["bar","baz"]
["bang","boom"]

A problem with this would be when the quoted values themselves contain newlines; then the output will probably be very hard to use in a meaningful way. For my purposes it would be nice to somehow escape in-value newlines e.g. as \n but I'm not sure that works without making assumptions about the content or the downstream use case.

(Even if in-value newlines break things, I think raw output could still be useful for when you are sure you don't have in-value newlines, which for me is pretty often.)

Multiple columns: To be honest I hadn't thought about this. I'm not sure what to expect; perhaps one value per line but they are interleaved like:

$ cat my.csv | xsv select 1,2 --raw
foo
["bar","baz"]
buzz
["bang","boom"]

What delimiter is used for the raw output?

Ultimately my aim is to pipe things to other line-wise programs, so the natural answer is U+000A (line feed).

amake avatar Feb 16 '21 13:02 amake

I also think this would be a great use case to add for xsv.

I had similar data and did the following to work around:

xsv select col_name file.csv | sed -E 's/""/"/g; s/^"//g; s/"$//g' | tail -n +2 | jq .

Basically replace the quotes with sed and skip the header row with tail

JLHasson avatar May 12 '22 18:05 JLHasson

In case other people find this via Google and need a workaround, this is mine using qsv and jq:

< my.csv qsv tojsonl | jq '.RAW_JSON' -r

I would still love to see this feature in xsv.

fluffysquirrels avatar Jan 09 '23 10:01 fluffysquirrels