QP Hou comments

Results 365 comments of


QP Hou

High-level writer API

I think you are spot on with your analysis @roeap :) I don't have much to add. In step 3, we can also include `optimize`. I also think it would...

High-level writer API

> on how far we want to take this within this crate. I want to take it very far, to the point where we can use it to process delta...

Support bloom filter table indexes

Databricks still hasn't open-sourced this feature I believe.

Support bloom filter table indexes

welp, we can always reverse engineer the format if anyone is interested in doing that :D

Datafusion integration assumes table's data files are local

Yeah, unfortunately, datafusion uses arrow parquet readers, which only supports local file at the moment: https://github.com/apache/arrow/blob/master/rust/datafusion/src/physical_plan/parquet.rs#L181. I think this is best handled by the rust parquet reader with minor adjustments...

Datafusion integration assumes table's data files are local

@meastham feel free to start a discussion for s3 support in the upstream datafusion github repo or in the arrow dev mailing list.

Datafusion integration assumes table's data files are local

@gopik yes, we are pending on upstream object store support for s3. datafusion execution plan integration is all complete other than partition column support, which should be fairly straight forward...

Datafusion integration assumes table's data files are local

@gopik it will be part of datafusion, see https://github.com/apache/arrow-datafusion/issues/907

Prevent invalid remove action during deserialization

@mosyp do you know how to reproduce this bug?

Prevent invalid remove action during deserialization

Thanks @mosyp for the explanation, I am still not fully following this line. > we end up with removing metadata from schema for every tombstone, even if some of them...