jackson-dataformats-text icon indicating copy to clipboard operation
jackson-dataformats-text copied to clipboard

CSV: FAIL_ON_MISSING_COLUMNS does not work as expected when combined with withColumnReordering(true)

Open ivangreene opened this issue 3 years ago • 2 comments

The CSV feature CsvParser.Feature.FAIL_ON_MISSING_COLUMNS does not work as expected when using a typed CsvSchema with withColumnReordering(true). If a column appears in the POJO, but not in the file, no failure occurs. This seems to be due to the withColumnReordering(true) completely rebuilding the columns, but the FAIL_ON_MISSING_COLUMNS check only occurs during reading of individual rows of data, and it only checks against the rebuilt columns (from the file header, not against the POJO's columns).

Relevant lines:

  • Check of reordersColumns: https://github.com/FasterXML/jackson-dataformats-text/blob/2.13/csv/src/main/java/com/fasterxml/jackson/dataformat/csv/CsvParser.java#L787
    • If reordersColumns is true, we jump to here and fully rebuild the columns: https://github.com/FasterXML/jackson-dataformats-text/blob/2.13/csv/src/main/java/com/fasterxml/jackson/dataformat/csv/CsvParser.java#L818
    • Somewhere around here is probably the best place to check the rebuilt schema columns against the original schema columns, if FAIL_ON_MISSING_COLUMNS is enabled: https://github.com/FasterXML/jackson-dataformats-text/blob/2.13/csv/src/main/java/com/fasterxml/jackson/dataformat/csv/CsvParser.java#L850
  • This seems to be the only place we check for missing columns right now: https://github.com/FasterXML/jackson-dataformats-text/blob/2.13/csv/src/main/java/com/fasterxml/jackson/dataformat/csv/CsvParser.java#L924-L925
    • The method called: https://github.com/FasterXML/jackson-dataformats-text/blob/2.13/csv/src/main/java/com/fasterxml/jackson/dataformat/csv/CsvParser.java#L1095

ivangreene avatar Jun 03 '21 21:06 ivangreene

I ran into this issue today. In my case reorderColumns was false. In this case no validation is done, and the parsing only fails if there actually is data. Which means that a one-line csv file is considered empty (but valid) even if the header is not valid at all.

I would be good if the existence of the headers is validated when CsvParser.Feature.FAIL_ON_MISSING_COLUMNS is enabled (with and without column reordering).

WellingR avatar Jun 28 '21 15:06 WellingR

What would help here would be a test case to show the details -- I think I understand @WellingR's case but that seems different from @ivangreene 's original case. It might also be necessary to file a different issue for @WellingR's case if that is only related to "header only" content; with something that would constitute invalid header (I also don't know what exactly that part means come to think of it).

cowtowncoder avatar Aug 12 '21 03:08 cowtowncoder