parquetjs icon indicating copy to clipboard operation
parquetjs copied to clipboard

Statistics

Open ZJONSSON opened this issue 7 years ago • 4 comments

Subsequent to https://github.com/ironSource/parquetjs/pull/52

Calculate statistics for each page and each column, including:max_value, min_value, null_count, distinct_count. For any columns that are sorted, the statistics either on column level or page level allows skipping over sections that are not of interest.

ZJONSSON avatar Feb 15 '18 00:02 ZJONSSON

Improved tests required: should capture statistics that are different across pages and row_groups and include null_values and unique_value counts

ZJONSSON avatar Feb 18 '18 23:02 ZJONSSON

Not ready to merge. max_value and min_value have to be encoded with the column encoding

ZJONSSON avatar Feb 28 '18 01:02 ZJONSSON

Hi,

I see this PR has been pending for almost a year now. Do you need any help? I can test locally or contribute if there's more to do.

hadrienk avatar Feb 11 '19 13:02 hadrienk

Is there anything I could do to help with this PR?

dobesv avatar Nov 29 '19 23:11 dobesv