jackson-dataformats-text
jackson-dataformats-text copied to clipboard
Support parsing CSV with header regardless of unknown columns
When reading given CSV with jackson-dataformat-csv 2.11.4
name,weight,age
Roger,69,27
Chris,89,53
using following snippet
CsvMapper csvMapper = new CsvMapper();
CsvSchema csvSchema = CsvSchema.builder().setUseHeader(true)
.addColumn("name").addColumn("age").build();
List<Person> persons = csvMapper
.readerFor(Person.class)
.with(csvSchema)
.<Person> readValues(csv)
.readAll();
...
class Person {
public String name;
public int age;
}
a CsvMappingException is thrown (Too many entries: expected at most 2) because the column weight is not known to CsvSchema.
csvMapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false);
still leads to the same CsvMappingException.
Thus please introduce a new CsvParser feature e.g. IGNORE_UNKNOWN_COLUMNS
(disabled by default) that allows reading CSV regardless of unknown columns.
Reorder the columns:
CsvSchema csvSchema = CsvSchema.builder().setUseHeader(true).setReorderColumns(true) .addColumn("name").addColumn("age").build();
or skip adding columns explicitly when using setUseHeader(true)
CsvSchema csvSchema = CsvSchema.builder().setUseHeader(true).build();
Reorder the columns:
CsvSchema csvSchema = CsvSchema.builder().setUseHeader(true).setReorderColumns(true) .addColumn("name").addColumn("age").build();
But the use case expects the columns name and age in given order and should fail otherwise. At the moment explicitly declaring header columns and the reorder column feature are mutually exclusive due to this: https://github.com/FasterXML/jackson-dataformats-text/blob/810772312735f1fb89d6fa37dd70e150e9cc783b/csv/src/main/java/com/fasterxml/jackson/dataformat/csv/CsvParser.java#L787 and can be considered as a bug.
or skip adding columns explicitly when using setUseHeader(true)
CsvSchema csvSchema = CsvSchema.builder().setUseHeader(true).build();
But then FAIL_ON_MISSING_COLUMNS
feature can't be used anymore and name and age aren't required columns anymore.
Same issue was encountered with jackson-dataformat-csv 2.13.4, trying to parse a csv file(>100 columns) to a Java entity(10 attributes). I have tried to use
ObjectReader csvReader = csvMapper.disable(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES) .readerFor(BlackList.class) .with(csvSchema);
But I have found that the values in the unknown columns are parsed to the next column, messed up data in the DB. As @bjmi mentioned, IGNORE_UNKNOWN_PROPERTIES will likely solve my problem
a CsvMappingException is thrown (Too many entries: expected at most 2) because the column weight is not known to CsvSchema. csvMapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false); still leads to the same CsvMappingException. Thus please introduce a new CsvParser feature e.g. IGNORE_UNKNOWN_COLUMNS (disabled by default) that allows reading CSV regardless of unknown columns.
I can get it to work if when reading I use a schema .withHeader() and .withColumnReordering().
FAIL_ON_UNKNOWN_PROPERTIES is disabled for me, but I didn't test if it's necessary.
So in the end I am using two different schemas: for writing without column reordering and for reading with column reordering.