iceberg-rust
iceberg-rust copied to clipboard
Concurrent data file fetching and parallel RecordBatch processing
This brings some big performance gains vs the previous sequential batch processing. On my 12-core Ryzen 9 5900X, I see all 12 cores hitting about 50% utilization.
Performance on retrieval of all the data on a full table scan in my perf testing branch for this hit 84 million rows in 7s, or over 11M rows/sec. Real world could be quite a bit faster as 50% of the CPU usage was for Minio serving up the data files.
As with the concurrent file plan PR, the concurrency config has been set to fast defaults based on testing a range of values but can be user-configured.
Performance test results, generated using the tests in https://github.com/apache/iceberg-rust/pull/497: