jackson-dataformats-text icon indicating copy to clipboard operation
jackson-dataformats-text copied to clipboard

CSV: Support for Master/Detail - variant repeating column formats

Open DALDEI opened this issue 5 years ago • 5 comments

Common in CRM systems is exporting as 'CSV' master/detail data like Invoices.

An example: (anonymized)

HEADERST|A-SYSTEM|04/28/20|200557  |DAILY |  
CUSTOMER|111111111|3041|05/28/20|05/28/20||05/28/20|US|PRINT |||STANDARD  |MAILTO ID NO. |  
FROMADDR|PROVO/4H |4H |835 WEST SAN JOSE ST |OGDEN, UT 84401 ||(800)253-0277|  
BILLADDR|ABC CO |1222 EAST 111 NORTH |SALT LAKE, UT 84040 ||||US |  
REMTADDR|SENDTO |4H |LB 11111 |PO BOX 22222 |MYTOWN, WA 99999-5143 |  
SHIPADDR|23497357|ABC CO  CUSTOMER# 11111|698 N. PLAINS ||PROVO |UT|11111 |SALT LAKE CITY/4H |  
STMTDTLS|04/22/20|1111111 |INV|222.13|0.00|0.00|222.13|05/29/20|NET 7 DAYS  |  
STMTDTLS|04/25/20|2222222 |INV|333.21|0.00|0.00|333.21|06/01/20|NET 7 DAYS  |  
STMTDTLS|04/26/20|3333333 |INV|383.22|0.00|0.00|383.22|06/02/20|NET 7 DAYS  |  
STMTDTLS|04/26/20|4445444 |INV|1799.95|0.00|0.00|1799.95|06/02/20|NET 7 DAYS  |  
STMTDTLS|04/27/20|5555555 |INV|22.56|0.00|0.00|22.56|06/03/20|NET 7 DAYS  |  
STMTDTLS|04/28/20|5555555 |INV|44.18|0.00|0.00|55.18|06/04/20|NET 7 DAYS  |  
STMTTOTL|4444.25|0.00|0.00|0.00|0.00|4444.25|  
STMTMSGS|**For customer inquiries, call 1-800-222-3377. Option 1, Option 2** ||  

The identifying feature is a column (usually col 1) which indicates the 'type' of that record. All records of the same 'type' have the same structure/schema. The above snippet repeats, the outermost repeating block represents 1 logical 'row'

Short of modeling the actual nested structure, being able to specify alternate 'schemas' identified by a column value, and a different class/pojo -- read sequentially. These could be a hierarchy that made it easier to integrate into the JSON data model. e.g. all derived from the same base class with only the 1 shared field (col1). That should map well to polymorphic serialization with a type field.

What I was thinking of doing with this is 'forking' the input stream and choosing the schema on a row by row basis -- but to do that requires parsing the stream twice.

An alternative -- maybe this is possible now -- is to have a 2 step deserializer (deserializer?) The first step reading into List<String>, then by selecting list[0] using a seperate schema. The problem is getting CSV parser to take a List<String> instead of a InputStream as its input.

DALDEI avatar Aug 02 '20 13:08 DALDEI

Have you looked into @JsonSubTypes? Scroll down to section 5: https://www.baeldung.com/jackson-annotations

jdimeo avatar Aug 13 '20 20:08 jdimeo

This is also related to other recent ticket https://github.com/FasterXML/jackson-dataformats-text/issues/202 which has useful commentary from the author himself about polymorphism with CSV

jdimeo avatar Aug 13 '20 20:08 jdimeo

I encountered the same issue. JsonSubTypes does not work for CSV. Jackson determines order of properties for the base class and throws "Unrecognized column 'type': known columns:" when trying to serialize any subclass with additional columns. Jackson should determine order of properties for each subclass.

kdebski85 avatar May 18 '21 09:05 kdebski85