FAIL_ON_MISSING_COLUMNS doesn't work with column reordering, header line
When reading given CSV with jackson-dataformat-csv 2.11.4
name
Roger
Chris
using following snippet
CsvMapper csvMapper = new CsvMapper();
csvMapper.configure(CsvParser.Feature.FAIL_ON_MISSING_COLUMNS, true);
CsvSchema csvSchema = CsvSchema.builder().setUseHeader(true).setReorderColumns(true)
.addColumn("name").addColumn("age").build();
List<Person> persons = csvMapper
.readerFor(Person.class)
.with(csvSchema)
.<Person> readValues(csv)
.readAll();
...
class Person {
public String name;
public int age;
}
doesn't throw a CsvMappingException although age column is missing in CSV.
The same thing is true of the strict headers feature. I really don't care about the order of the columns, or any additional ones, I just care that the ones I want are present.
This should be a common requirement it is a shame it cannot be enforced
There is also no way to have an error if a column name is repeated in the header when reordering is in use. It silently chooses one.
I have hoped to look into this for a while now but for past 3 months my paying job has taken over almost all of my coding time, since I do not get paid to work on Jackson (not part of my daytime job).
So, @mamos-helloself PRs would obviously be welcome.
I may be able to at least add the failing test soon now tho.
Hmmh. Looking at this now I wonder if the reason is due to "use header" -- header really only has 1 column and if that is to be considered authoritative, there is nothing missing? So perhaps it goes to precedence/definition of how explicit schema should interact with header line...
I'll have to look into this more and will first add a failing test just to reproduce what is seen here.
Hmmh. Looking at this now I wonder if the reason is due to "use header" -- header really only has 1 column and if that is to be considered authoritative, there is nothing missing?
The one column example was probably to simple. What about this two column CSV:
name,age
Roger,27
Chris,53
CsvMapper csvMapper = new CsvMapper();
csvMapper.configure(CsvParser.Feature.FAIL_ON_MISSING_COLUMNS, true);
CsvSchema csvSchema = CsvSchema.builder().setUseHeader(true).setReorderColumns(true)
.addColumn("name").addColumn("weight").addColumn("age").build();
List<Person> persons = csvMapper
.readerFor(Person.class)
.with(csvSchema)
.<Person> readValues(csv)
.readAll();
...
class Person {
public String name;
public int weight;
public int age;
}
doesn't throw a CsvMappingException although weight column is missing in CSV.
Sorry I misunderstood your last comment, but the example is valid.
My daily use case has 3 requirements
- Column order shouln't matter - mapping is done by column headers
- Fail when columns are missing
- Ignore superfluous columns i.e. column headers that aren't defined previously - #286
Is there a real chance that this issue will be fixed?
Yes, if and when someone has time to look into it. I have been hoping to address it but I am hopelessly swamped at work and helping others with PRs with Jackson, releases all that stuff. But I do hope to help here in some form.
It is likely that for this to move in near future, a PR would be needed.
On plus side, as per my earlier comment, a failing test is now included in repo (has to be run via IDE, excluded from default test run).
Hmmm. I suspect that the issue is with the actual header line only having one column, and thereby "extra" columns of CsvSchema being discarded. I think it is reasonable to expect either that:
- If header line has fewer columns than schema, exception would be thrown OR
- If header line has fewer columns the "missing" columns from
CsvSchemawould simply be appended in the order they were given
I think I'll proceed with (2), although I wonder if some configurability might be needed anyway, to alter behavior.
And yes, I am finally circling back here, hoping to resolve this for 2.14(.0).
EDIT: re-reading full discussion, it does sound like (1) is what is wanted actually. So I will try doing that instead; this would also make the test I added pass.
Ok: changed the behavior to optionally throw an exception if column headers are missing (compared to registered CsvSchema), controller by new
CsvParser.Feature.FAIL_ON_MISSING_HEADER_COLUMNS
which defaults to true (that is, by default this problem is reported).
Will be in 2.14.0 release
Perhaps I'm misunderstanding something, but I'm not seeing anywhere in which the new FAIL_ON_MISSING_HEADER_COLUMNS feature is actually checked to decide whether this new behavior should occur?
I.e., should there be something like this?
diff --git a/csv/src/main/java/com/fasterxml/jackson/dataformat/csv/CsvParser.java b/csv/src/main/java/com/fasterxml/jackson/dataformat/csv/CsvParser.java
index b7b050e4..c3621ae9 100644
--- a/csv/src/main/java/com/fasterxml/jackson/dataformat/csv/CsvParser.java
+++ b/csv/src/main/java/com/fasterxml/jackson/dataformat/csv/CsvParser.java
@@ -862,14 +862,16 @@ public class CsvParser
}
}
// [dataformats-text#285]: Are we missing something?
- int diff = schemaColumnCount - newColumnCount;
- if (diff > 0) {
- Set<String> oldColumnNames = new LinkedHashSet<>();
- _schema.getColumnNames(oldColumnNames);
- oldColumnNames.removeAll(newSchema.getColumnNames());
- _reportCsvMappingError(String.format("Missing %d header column%s: [\"%s\"]",
- diff, (diff == 1) ? "" : "s",
- String.join("\",\"", oldColumnNames)));
+ if (CsvParser.Feature.FAIL_ON_MISSING_HEADER_COLUMNS.enabledIn(_formatFeatures)) {
+ int diff = schemaColumnCount - newColumnCount;
+ if (diff > 0) {
+ Set<String> oldColumnNames = new LinkedHashSet<>();
+ _schema.getColumnNames(oldColumnNames);
+ oldColumnNames.removeAll(newSchema.getColumnNames());
+ _reportCsvMappingError(String.format("Missing %d header column%s: [\"%s\"]",
+ diff, (diff == 1) ? "" : "s",
+ String.join("\",\"", oldColumnNames)));
+ }
}
// otherwise we will use what we got