Results 365 comments of QP Hou

I think you are spot on with your analysis @roeap :) I don't have much to add. In step 3, we can also include `optimize`. I also think it would...

> on how far we want to take this within this crate. I want to take it very far, to the point where we can use it to process delta...

Databricks still hasn't open-sourced this feature I believe.

welp, we can always reverse engineer the format if anyone is interested in doing that :D

Yeah, unfortunately, datafusion uses arrow parquet readers, which only supports local file at the moment: https://github.com/apache/arrow/blob/master/rust/datafusion/src/physical_plan/parquet.rs#L181. I think this is best handled by the rust parquet reader with minor adjustments...

@meastham feel free to start a discussion for s3 support in the upstream datafusion github repo or in the arrow dev mailing list.

@gopik yes, we are pending on upstream object store support for s3. datafusion execution plan integration is all complete other than partition column support, which should be fairly straight forward...

@gopik it will be part of datafusion, see https://github.com/apache/arrow-datafusion/issues/907

@mosyp do you know how to reproduce this bug?

Thanks @mosyp for the explanation, I am still not fully following this line. > we end up with removing metadata from schema for every tombstone, even if some of them...