FastCSV
FastCSV copied to clipboard
Eliminate iterator allocations when writing CsvRecord
Is your feature request related to a problem? Please describe.
I'm using FastCSV to split massive CSV files (~1 billion rows) into multiple files
by grouping rows by key columns. When reading with CsvRecord.getFields() and
writing with CsvWriter.writeRecord(Iterable<String>), I observe ~1 TiB of
iterator allocations (via JFR profiling).
The issue:
CsvRecord.getFields()wraps the internal array:Collections.unmodifiableList(Arrays.asList(fields))CsvWriter.writeRecord(Iterable<String>)creates an iterator for each row- For 1 billion rows, this creates ~1 billion short-lived iterator objects
Describe the solution you'd like
Add a zero-allocation path by providing any one of these:
CsvWriter.writeRecord(CsvRecord record)- directly access internal fields array. With this we also can avoid the allocations thatCollections.unmodifiableList(Arrays.asList(fields))creates ofCollections$UnmodifiableRandomAccessListandArrays$ArrayListCsvRecord.getFieldsArray()- expose internal array for use withwriteRecord(String... values)- Smart iterator check - if Iterable is a List, use indexed access instead of iterator
Describe alternatives you've considered
Current workarounds:
record.toArray(new String[0])- reduces allocation by ~50% but still allocates arrays- Indexed field-by-field writing with
CsvWriterRecord- createsCsvWriterRecordobjects instead - Reflection to access
fields- works but fragile and ugly