faq icon indicating copy to clipboard operation
faq copied to clipboard

CSV

Open eadmaster opened this issue 4 years ago • 7 comments

With separators auto-guessing and optional overrides: https://en.wikipedia.org/wiki/Comma-separated_values

faq -f json -o csv < file.json

eadmaster avatar May 27 '20 15:05 eadmaster

I think this is a reasonable addition, given the existence of a good Go CSV library. Suggestions welcome!

jzelinskie avatar May 27 '20 15:05 jzelinskie

converting csv->json should be a no brainer, while the opposite is not because csv only supports a flat schema.

Using the dot notation seem the most common solution for nested records:

  • https://stackoverflow.com/questions/41028114/how-to-convert-array-of-nested-objects-to-csv
  • https://stackoverflow.com/questions/37706351/nested-json-to-csv-generic-approach

Btw since this is rarely used i think you could just make the conversion fail in this case.

eadmaster avatar May 28 '20 21:05 eadmaster

Encoding is indeed a huge pain. It may be best if you just start with decoding only.

When I tried to implement this and I found it to be a pain because you have to basically decide how to handle non-arrays and arrays of non-homogenous objects. It's probably okay to simply fail in many cases, but limits the usefulness quite a bit.

For decoding, you'll also need to think about the following:

  • Do you produce an array of arrays or an array of objects? When I talked to Jimmy about this I brought up the idea that there could be a new flag to control decoder specific behavior. Eg; --decoder-opts="csv=array-of-arrays".
  • How do you handle headers missing if you're producing an array of objects? Eg probably just set the objects fields to something like 0, 1 or field1, field2.
  • If producing an array of arrays: do you include the CSV headers in the first array? Maybe another decoder option.

chancez avatar May 28 '20 22:05 chancez

* How do you handle headers missing if you're producing an array of objects? Eg probably just set the objects fields to something like `0`, `1` or `field1`, `field2`.

An array of objects should be nicer.

* How do you handle headers missing if you're producing an array of objects?

you can omit the fields in the objects or leave them as empty strings.

e.g. you have this csv:

name,children,thing
1,2,3
,,2,3
1,,3

This is the catmandu convert CSV to JSON output:

[
  {
    "thing": "3",
    "name": "1",
    "children": "2"
  },
  {
    "children": "",
    "name": "",
    "thing": "2"
    // note the 3rd value is omitted here because it has no header
  },
  {
    "children": "",
    "name": "1",
    "thing": "3"
  }
]

Alternative with omitted missing fields:

[
  {
    "thing": "3",
    "name": "1",
    "children": "2"
  },
  {
    "thing": "2"
  },
  {
    "name": "1",
    "thing": "3"
  }
]

eadmaster avatar May 29 '20 12:05 eadmaster

I wasn't talking about values missing, but headers. Eg: What does this CSV produce as JSON?

1,2,3
,,2,3
1,,3

chancez avatar May 29 '20 18:05 chancez

catmandu just assume the first row is always the header.

$ cat test.csv
1,2,3
,,2,3
1,,3

$ catmandu convert CSV to JSON < test.csv  | jq .                                                                                                                                                                                                                                   
[
  {
    "1": "",
    "2": "",
    "3": "2"
  },
  {
    "3": "3",
    "2": "",
    "1": "1"
  }
]

Another command that is able to convert csv->json is rows (you can install via the python-rows package) :

$ rows  convert  test.csv  test.json
$ cat test.json | jq
[
  // same behavior here
  {
    "field_1": null,
    "field_2": null,
    "field_3": 2
  },
  {
    "field_1": 1,
    "field_2": null,
    "field_3": 3
  }
]

With the headers i get this, which looks a bit confusing to me:

[
  {
    "name": 1,
    "children": 2,
    "thing": 3,
    "field_3": null
  },
  {
    "name": null,
    "children": null,
    "thing": 2,
    "field_3": 3
  },
  {
    "name": 1,
    "children": null,
    "thing": 3,
    "field_3": null
  }
]

eadmaster avatar May 29 '20 18:05 eadmaster

I think the 2nd output is more what I was describing. I don't think it's wise to assume every file has headers, hence the point I was trying to make about having decoder specific options to control this kind of behavior. It's probably fine to start with assuming headers are set. Someone can add headers like this:

faq -fcsv -ojson '.' <(echo col1,col2,col3; cat my-csv-file.csv)

chancez avatar May 29 '20 19:05 chancez