jackson-dataformats-text
jackson-dataformats-text copied to clipboard
CSV: Support for Map and Nested fields
Use Case: reading/writing 'complex' CSV files interoperable with Athena / Presto / Hive
Example: a 'serde' for CSV from Hive (or athena or presto ) can support the following attributes
row format delimited
fields terminated by ','
collection items terminated by '|'
map keys terminated by '#'
In addition to the above is a built in (not customizable) support for nested collections which are deliminated by 'level' with a different delimiter per level -- enabling representing [ 1 , [2,3, { "a": [ 4 ] ]
Currently the array element separator provides part of this A useful addition would a 'map' seperator that would work like the arrays but create json Objects.
example:
FIELDS=,
ARRAY=|
MAP=#
1,2|3|4,key#value|key2#value2|key3#value3,four
could map to the java class
class Complex {
int col1;
int col2[];
innerclass col3 ;
String col4;
}
class innerclass {
String key ;
String key2;
String key3;
}
Nested requires 'levels' of both array and map separators.
Hive Reference https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTableCreate/Drop/TruncateTable