Chao Sun
Chao Sun
I'm trying to use this Hadoop 2.7.2. Compiled the native lib on Mac. `cargo build` works fine, but when `exec.sh cargo test` I got this: ``` ➜ hdfs-rs (master) ✗...
The goal for this task is to implement basic put & get functionality for the DB, using existing stuff such as memtable, log reader/writer, write batch, etc.
In order to read into Arrow format, we need to add a `get_spaced` (borrowing from the c++ version) method in the decoder to leave spaces for null values, in the...
This is the umbrella ticket to track adding Apache Arrow support. Tasks: - [ ] Add Arrow schema converter for read path (#185). - [ ] Add Arrow schema converter...
One of major nightly feature we rely on: impl trait, has been stabilized in the latest 1.26 release. We should see if it's easy to remove the nightly dependency and...
[Parquet-906](https://issues.apache.org/jira/browse/PARQUET-906) introduced new logical type representation which we should consider to upgrade to. Parquet-MR is also going to [work on this](https://issues.apache.org/jira/browse/PARQUET-1253). This requires us to upgrade to Parquet format 2.5.0
Similar to [PARQUET-684](https://issues.apache.org/jira/browse/PARQUET-684), dictionary encoding can potentially be improved using SIMD instructions such as `gather`. There's a [blog post](https://lemire.me/blog/2016/08/25/faster-dictionary-decoding-with-simd-instructions/) that describes this idea, and a [prototype](https://github.com/lemire/dictionary).
[Parquet encoding format](https://github.com/apache/parquet-format/blob/master/Encodings.md) specifies boolean should be encoded as bit-packed, LSB first. In the current implementation we just encode it as plain 1-bit value. It is a little confusing though,...
It may be useful to do some profiling on encoding & decoding. We can use existing bench for this. Some useful scripts: For CPU: ``` perf record -g -F 1000...
This is the preliminary work before we can implement reading for Parquet files. We may just follow what Impala & Parquet-cpp does (https://github.com/apache/parquet-cpp/blob/master/src/parquet/util/memory.h), but replace with Rust's semantics.