parquet-rs icon indicating copy to clipboard operation
parquet-rs copied to clipboard

HDFS Support?

Open andygrove opened this issue 6 years ago • 1 comments

I'm curious if there are plans to support HDFS within this crate?

The Java parquet library allows parquet files to be read locally or from HDFS and in both cases it is possible to push down the projection and only retrieve the columns needed, which can make a huge difference in performance.

andygrove avatar Apr 17 '18 23:04 andygrove

Yes that's in the plan. I took a brief look at the hdfs crate, and think it should be relatively easy to have HdfsFile implement the Read. With that, it seems to only require a small code change.

The projection pushdown is orthogonal to the HDFS issue I think. It is already supported by the current record reader.

sunchao avatar Apr 18 '18 04:04 sunchao