jackson-dataformats-text
jackson-dataformats-text copied to clipboard
CSV: Support for Master/Detail - variant repeating column formats
Common in CRM systems is exporting as 'CSV' master/detail data like Invoices.
An example: (anonymized)
HEADERST|A-SYSTEM|04/28/20|200557 |DAILY |
CUSTOMER|111111111|3041|05/28/20|05/28/20||05/28/20|US|PRINT |||STANDARD |MAILTO ID NO. |
FROMADDR|PROVO/4H |4H |835 WEST SAN JOSE ST |OGDEN, UT 84401 ||(800)253-0277|
BILLADDR|ABC CO |1222 EAST 111 NORTH |SALT LAKE, UT 84040 ||||US |
REMTADDR|SENDTO |4H |LB 11111 |PO BOX 22222 |MYTOWN, WA 99999-5143 |
SHIPADDR|23497357|ABC CO CUSTOMER# 11111|698 N. PLAINS ||PROVO |UT|11111 |SALT LAKE CITY/4H |
STMTDTLS|04/22/20|1111111 |INV|222.13|0.00|0.00|222.13|05/29/20|NET 7 DAYS |
STMTDTLS|04/25/20|2222222 |INV|333.21|0.00|0.00|333.21|06/01/20|NET 7 DAYS |
STMTDTLS|04/26/20|3333333 |INV|383.22|0.00|0.00|383.22|06/02/20|NET 7 DAYS |
STMTDTLS|04/26/20|4445444 |INV|1799.95|0.00|0.00|1799.95|06/02/20|NET 7 DAYS |
STMTDTLS|04/27/20|5555555 |INV|22.56|0.00|0.00|22.56|06/03/20|NET 7 DAYS |
STMTDTLS|04/28/20|5555555 |INV|44.18|0.00|0.00|55.18|06/04/20|NET 7 DAYS |
STMTTOTL|4444.25|0.00|0.00|0.00|0.00|4444.25|
STMTMSGS|**For customer inquiries, call 1-800-222-3377. Option 1, Option 2** ||
The identifying feature is a column (usually col 1) which indicates the 'type' of that record. All records of the same 'type' have the same structure/schema. The above snippet repeats, the outermost repeating block represents 1 logical 'row'
Short of modeling the actual nested structure, being able to specify alternate 'schemas' identified by a column value, and a different class/pojo -- read sequentially. These could be a hierarchy that made it easier to integrate into the JSON data model. e.g. all derived from the same base class with only the 1 shared field (col1). That should map well to polymorphic serialization with a type field.
What I was thinking of doing with this is 'forking' the input stream and choosing the schema on a row by row basis -- but to do that requires parsing the stream twice.
An alternative -- maybe this is possible now -- is to have a 2 step deserializer (deserializer?) The first step reading into List<String>, then by selecting list[0] using a seperate schema. The problem is getting CSV parser to take a List<String> instead of a InputStream as its input.
Have you looked into @JsonSubTypes? Scroll down to section 5:
https://www.baeldung.com/jackson-annotations
This is also related to other recent ticket https://github.com/FasterXML/jackson-dataformats-text/issues/202 which has useful commentary from the author himself about polymorphism with CSV
I encountered the same issue. JsonSubTypes does not work for CSV. Jackson determines order of properties for the base class and throws "Unrecognized column 'type': known columns:" when trying to serialize any subclass with additional columns. Jackson should determine order of properties for each subclass.