parquet-rs
parquet-rs copied to clipboard
HDFS Support?
I'm curious if there are plans to support HDFS within this crate?
The Java parquet library allows parquet files to be read locally or from HDFS and in both cases it is possible to push down the projection and only retrieve the columns needed, which can make a huge difference in performance.
Yes that's in the plan. I took a brief look at the hdfs crate, and think it should be relatively easy to have HdfsFile
implement the Read
. With that, it seems to only require a small code change.
The projection pushdown is orthogonal to the HDFS issue I think. It is already supported by the current record reader.