asfimport
asfimport
[Gorilla](https://www.vldb.org/pvldb/vol8/p1816-teller.pdf) is a de facto encoding algorithm for float numbers, it has been used by many time series database such as InfluxDB, TimescaleDB for a while. For now Parquet only...
I often needs to create tens of milliions of small dataframes and save them into parquet files. all these dataframes have the same column and index information. and normally they...
Hi parquet-mr teams, When I reading the parquet writer in [ParquetFileWriter]( I find that there is no `column metadata` behind the each column chunk described in the [Parquet-Format] ( and...
Int8 and Int16 are not supported as basic in previos version. Using 4 bytes to store int8 seems not a good idea, which means requiring more storage and read and...
Since this is a parquet-specific encoder, it would be good to have a more complete description of the encoding/decoding, so that implementations have a easier time implementing it. **Reporter**: [Jorge...
The Nested Encoding section of documentation doesn't escape the `_` character, so it looks as following: Two encodings for the levels are supported BIT_PACKED and RLE. Only RLE is now...
In the example using delta-encoded, encoding [1, 2, 3, 4, 5], we state that ```java The final encoded data is: header: 8 (block size), 1 (miniblock count), 5 (value count),...
Currently ColumnMetaData only contains bloom_filter_offset, which points to BloomFilterHeader followed by the bloom filter data. This solution is not optimal during reading, as two IO reads are needed once we...
As understand it Parquet is a write once thing. So mutating data inside Parquet files is not an option. Now there is a new cross EU law coming in effect...
It would be great if Parquet would store `dictionary entries` for columns marked to be used for joins. When a column is used for a join (it could be a...