parquet-dotnet
parquet-dotnet copied to clipboard
Support RLE encoding in data pages for bools
Issue description
My reading of the Parquet format suggests to me that it is allowable to use Run Length Encoding / Bit-Packing Hybrid (RLE = 3) encoding for bool values. With heavily repeated values RLE can be a lot more efficient than bit packing even for bools. I believe it is valid in both v1 and v2 files.
As an alternative, because it already support RLE/BitPacking for indexes, dictionary encoding for bools would give comparable size savings on heavily repeated values but with slightly more overhead but although the format spec does not appear to prohibit it, "parquet-mr" does not allow Dictionary encoding for bools (I suspect because it supports RLE/BitPacking encoding which will always be more efficient than a dictionary of bool anyway) so for compatibility reasons it's probably better to just support RLE/BitPacking for bool values and use that instead.