delta-rs icon indicating copy to clipboard operation
delta-rs copied to clipboard

Include file stats when converting a parquet directory to a Delta table

Open gruuya opened this issue 9 months ago • 0 comments

Description

Currently the ConvertToDeltaBuilder skips fetching and populating the stats https://github.com/delta-io/delta-rs/blob/81593e919497221a1a08bf8db9d20e8e4a39a8a6/crates/core/src/operations/convert_to_delta.rs#L332-L353

This results in log files missing the min/max/null count statistics.

Use Case

These stats are useful as they allow partition pruning and thus influence performance.

Granted it may be possible to use the stats from the files themselves, but that it is sub-optimal to reading from the log directly.

Related Issue(s)

gruuya avatar May 08 '24 15:05 gruuya