QP Hou
QP Hou
I think you are spot on with your analysis @roeap :) I don't have much to add. In step 3, we can also include `optimize`. I also think it would...
> on how far we want to take this within this crate. I want to take it very far, to the point where we can use it to process delta...
Databricks still hasn't open-sourced this feature I believe.
welp, we can always reverse engineer the format if anyone is interested in doing that :D
Yeah, unfortunately, datafusion uses arrow parquet readers, which only supports local file at the moment: https://github.com/apache/arrow/blob/master/rust/datafusion/src/physical_plan/parquet.rs#L181. I think this is best handled by the rust parquet reader with minor adjustments...
@meastham feel free to start a discussion for s3 support in the upstream datafusion github repo or in the arrow dev mailing list.
@gopik yes, we are pending on upstream object store support for s3. datafusion execution plan integration is all complete other than partition column support, which should be fairly straight forward...
@gopik it will be part of datafusion, see https://github.com/apache/arrow-datafusion/issues/907
@mosyp do you know how to reproduce this bug?
Thanks @mosyp for the explanation, I am still not fully following this line. > we end up with removing metadata from schema for every tombstone, even if some of them...