asfimport issues

Results 328 issues of


                                            asfimport

Support gorilla encoding for float numbers

[Gorilla](https://www.vldb.org/pvldb/vol8/p1816-teller.pdf) is a de facto encoding algorithm for float numbers, it has been used by many time series database such as InfluxDB, TimescaleDB for a while. For now Parquet only...

Priority: Major

Type: enhancement

support saving meta and data seperately

I often needs to create tens of milliions of small dataframes and save them into parquet files. all these dataframes have the same column and index information. and normally they...

Priority: Major

Type: enhancement

Why the logic of ParquetFileWrite is diff from the parquet format

Hi parquet-mr teams, When I reading the parquet writer in [ParquetFileWriter]( I find that there is no `column metadata` behind the each column chunk described in the [Parquet-Format] ( and...

Priority: Major

Type: bug

Support Int8 and Int16 as basic type

Int8 and Int16 are not supported as basic in previos version. Using 4 bytes to store int8 seems not a good idea, which means requiring more storage and read and...

Priority: Major

Type: enhancement

Add documentation about RLE-bitpack hybrid encoder

Since this is a parquet-specific encoder, it would be good to have a more complete description of the encoding/decoding, so that implementations have a easier time implementing it. **Reporter**: [Jorge...

Priority: Minor

Type: enhancement

Formatting is broken for description of BIT_PACKED

The Nested Encoding section of documentation doesn't escape the `_` character, so it looks as following: Two encodings for the levels are supported BIT_PACKED and RLE. Only RLE is now...

Priority: Minor

Type: enhancement

The example in delta-encoding seems incorrect

In the example using delta-encoded, encoding [1, 2, 3, 4, 5], we state that ```java The final encoded data is: header: 8 (block size), 1 (miniblock count), 5 (value count),...

Priority: Minor

Type: bug

Consider adding BloomFilterHeader to ColumnMetaData

Currently ColumnMetaData only contains bloom_filter_offset, which points to BloomFilterHeader followed by the bloom filter data. This solution is not optimal during reading, as two IO reads are needed once we...

Priority: Major

Type: enhancement

Support for GDPR erase requirements

As understand it Parquet is a write once thing. So mutating data inside Parquet files is not an option. Now there is a new cross EU law coming in effect...

Priority: Major

Type: enhancement

Store `dictionary entries` of parquet columns that will be used for joins

It would be great if Parquet would store `dictionary entries` for columns marked to be used for joins. When a column is used for a join (it could be a...

Priority: Major

Type: enhancement