jackson-dataformat-csv icon indicating copy to clipboard operation
jackson-dataformat-csv copied to clipboard

Support hierarchical delimiters and qualifiers.

Open prb opened this issue 10 years ago • 4 comments

The goal would be to transform nested delimiters into nested structures; sample input:

a,b;c;d,e;f

Sample output:

[
  [a],
  [b,c,d],
  [e,f]
]

(Brought over here from a discussion in FasterXML/jackson#2.)

prb avatar Aug 10 '13 20:08 prb

One quick mental note: I think this needs to be a feature to enable just because it will require re-scanning of column values. And/or support from higher level data-binder; we already get a "hint" from data-binder if an array value is expected (needed to support XML arrays).

Actually, come to think of it, "isExpectedStartArray" is probably needed anyway to support single-element arrays reliably. In addition need configurability of separator, default of semi-colon seems reasonable.

For output side it might be possible to make this work with less extra settings... a START_ARRAY could indicate mode in which values were appended with separator. So perhaps implementation could start with output-side first, as that should be simpler to get complete first.

cowtowncoder avatar Aug 10 '13 20:08 cowtowncoder

Ok, so: "inner delimeter" itself sounds reasonable. But how about quoting it? One possibility would be to use doubling (similar to quotes), although it would mean that one could not omit values (i.e. use empty String as marker for null).

I think that ideally this should work in a way to allow two-phase tokenization, which is much simpler to implement than (theoretically more efficient) single-pass, multi-state tokenization.

On the other hand: single-phase tokenization would allow use of escape character also for inner values, whereas two-phase does not (because first pass will handle unescaping and thereby make it impossible for secondary pass to skip ones that were escaped).

cowtowncoder avatar Mar 26 '14 02:03 cowtowncoder

Maybe implement this feature like this: https://github.com/Keyang/node-csvtojson#empowered-json-parser. I think this allows arbitrary nesting of data in json while still being easily editable in spreadsheets. TLDR: it describes the data hierarchy in the header like data.field[0].property.

mlvn23 avatar Mar 04 '15 17:03 mlvn23

Yes, use of naming convention can help. FWIW, use of @JsonUnwrapped already works, so while not as convenient, this is already partially doable. But more fluent support is sort of planned; it just requires integration with introspection (traversal of POJO properties) to produce logical names, and then handling nesting. And by planned I just mean "thought about regarding feasibility" :)

cowtowncoder avatar Mar 04 '15 18:03 cowtowncoder