parquet-java icon indicating copy to clipboard operation
parquet-java copied to clipboard

Improve performance of InternalParquetRecordReader (1%)

Open jerolba opened this issue 9 months ago • 0 comments

Describe the enhancement requested

Profiling the load of a Parquet file with Java Mission Control, I've noticed that InternalParquetRecordReader LongStream consumes relevant amount of time.

This LongStream can be replaced with a simpler Long Iterator that iterates from 0 to pages.getRowCount().

To measure the overhead I've created a test project that overwrites InternalParquetRecordReader implementation with a Long Iterator: https://github.com/jerolba/parquet-rowindexiterator

The execution time is sensitive to the context of the JVM, but running the benchmark multiple times shows that LongStream is slower than LongIterator, between 1% and 4% depending on the run.

Component(s)

No response

jerolba avatar May 25 '25 17:05 jerolba