FastCSV icon indicating copy to clipboard operation
FastCSV copied to clipboard

Eliminate iterator allocations when writing CsvRecord

Open brunnsbe opened this issue 3 weeks ago • 0 comments

Is your feature request related to a problem? Please describe.

I'm using FastCSV to split massive CSV files (~1 billion rows) into multiple files by grouping rows by key columns. When reading with CsvRecord.getFields() and writing with CsvWriter.writeRecord(Iterable<String>), I observe ~1 TiB of iterator allocations (via JFR profiling).

The issue:

  1. CsvRecord.getFields() wraps the internal array: Collections.unmodifiableList(Arrays.asList(fields))
  2. CsvWriter.writeRecord(Iterable<String>) creates an iterator for each row
  3. For 1 billion rows, this creates ~1 billion short-lived iterator objects

Describe the solution you'd like

Add a zero-allocation path by providing any one of these:

  1. CsvWriter.writeRecord(CsvRecord record) - directly access internal fields array. With this we also can avoid the allocations that Collections.unmodifiableList(Arrays.asList(fields)) creates of Collections$UnmodifiableRandomAccessList and Arrays$ArrayList
  2. CsvRecord.getFieldsArray() - expose internal array for use with writeRecord(String... values)
  3. Smart iterator check - if Iterable is a List, use indexed access instead of iterator

Describe alternatives you've considered

Current workarounds:

  • record.toArray(new String[0]) - reduces allocation by ~50% but still allocates arrays
  • Indexed field-by-field writing with CsvWriterRecord - creates CsvWriterRecord objects instead
  • Reflection to access fields - works but fragile and ugly

brunnsbe avatar Dec 07 '25 19:12 brunnsbe