parquetjs
parquetjs copied to clipboard
Statistics
Subsequent to https://github.com/ironSource/parquetjs/pull/52
Calculate statistics for each page and each column, including:max_value, min_value, null_count, distinct_count
. For any columns that are sorted, the statistics either on column level or page level allows skipping over sections that are not of interest.
Improved tests required: should capture statistics that are different across pages and row_groups and include null_values and unique_value counts
Not ready to merge. max_value
and min_value
have to be encoded with the column encoding
Hi,
I see this PR has been pending for almost a year now. Do you need any help? I can test locally or contribute if there's more to do.
Is there anything I could do to help with this PR?