jackson-dataformats-text icon indicating copy to clipboard operation
jackson-dataformats-text copied to clipboard

CSV: Support for Map and Nested fields

Open DALDEI opened this issue 5 years ago • 0 comments

Use Case: reading/writing 'complex' CSV files interoperable with Athena / Presto / Hive

Example: a 'serde' for CSV from Hive (or athena or presto ) can support the following attributes

row format delimited
fields terminated by ','
collection items terminated by '|'
map keys terminated by '#'

In addition to the above is a built in (not customizable) support for nested collections which are deliminated by 'level' with a different delimiter per level -- enabling representing [ 1 , [2,3, { "a": [ 4 ] ]

Currently the array element separator provides part of this A useful addition would a 'map' seperator that would work like the arrays but create json Objects.

example:

FIELDS=,
ARRAY=|
MAP=#

1,2|3|4,key#value|key2#value2|key3#value3,four

could map to the java class

class Complex { 
   int col1;
   int col2[];
   innerclass col3 ;
   String col4;
}

class innerclass {
   String key ;
   String  key2;
   String key3;
}



Nested requires 'levels' of both array and map separators.

Hive Reference https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTableCreate/Drop/TruncateTable

DALDEI avatar Aug 02 '20 13:08 DALDEI