Add an header line assertion/check
Is your feature request related to a problem? Please describe.
I want to assert the CSV file header content when I working with NamedCsvRecord.
There is no solution today to assert the header line of the csv file without checking all records.
Describe the solution you'd like
final var handler = NamedCsvRecordHandler.builder()
.expectedHeader("header A", "header B") // proposal
.build();
try (CsvReader<NamedCsvRecord> csvReader = CsvReader.builder()
.build(handler, file.getInputStream())) {
return csvReader.stream()
.map(this.csvRecordMapper::map)
.collect(Collectors.toCollection(LinkedHashSet::new));
}
So add an expectedHeaders List in NamedCsvRecordHandler and use it to validate headers.
Goal alignment
Seems to be aligned : simplify headers verifications without check on all lines without impact performances.
I can do a PR if your are aligned @osiegmar.
There are several ways to assert the header line of a CSV file without having to check all records.
Depending on your needs, you could use an enhanced for-each loop:
boolean first = true;
try (var csvReader = CsvReader.builder().ofNamedCsvRecord(file)) {
for (NamedCsvRecord rec : csvReader) {
if (first) {
first = false;
validateHeader(rec.getHeader());
}
// process the record ...
}
}
Or, when preferring a stream-based approach, you can use a Gatherer (in Java >= 24) to validate the header only once before processing the records:
Gatherer<NamedCsvRecord, AtomicBoolean, Object> validator = Gatherer.ofSequential(
() -> new AtomicBoolean(true),
(first, rec, downstream) -> {
if (first.getAndSet(false)) {
validateHeader(rec.getHeader());
}
return downstream.push(rec);
}
);
try (var csvStream = CsvReader.builder().ofNamedCsvRecord(file).stream()) {
csvStream
.gather(validator)
.forEach(rec ->
// process the record ...
);
}
Alternatively, you can access the header directly without managing a "special" first state. An iterator-based approach works well for this:
var handler = NamedCsvRecordHandler.of(builder ->
builder.returnHeader(true)
);
try (var iterator = CsvReader.builder().build(handler, file).iterator()) {
if (iterator.hasNext()) {
validateHeader(iterator.next().getFields());
}
iterator.forEachRemaining(rec ->
// process the record ...
);
}
Combinations are also possible, as you could start with an iterator to validate the header and then switch to a stream.
All approaches are sharing this validation method:
static void validateHeader(final List<String> header) {
var expectedHeader = List.of("header A", "header B");
if (!expectedHeader.equals(header)) {
throw new IllegalStateException("Header mismatch: expected %s but found %s"
.formatted(expectedHeader, header));
}
}
Adding an expectedHeader method to NamedCsvRecordHandler would certainly make header validation easier, but it is narrowly focused on that specific task. Some users may want to ignore the order of header fields or validate only certain required fields. Therefore, a more flexible approach, such as a validateHeader method, could be more useful in the long term.
var handler = NamedCsvRecordHandler.of(builder ->
builder.validateHeader(header -> {
if (!header.contains("header A")) {
throw new IllegalStateException("Header must contain at least 'header A'");
}
})
);
A few predefined Validators could be provided to allow most common use cases.
If you want, you can create a PR for further discussion and implementation of this feature.