Scott Sandre comments

Results 160 comments of


                                            Scott Sandre

FlinkDeltaSink: Add column stats from the parquet file to delta transaction log

Posting for visibility: this PR is currently blocked by https://github.com/delta-io/connectors/pull/425, as we need a way to provide options to the sink API to be able to wrap this feature behind...

Improve deterministic guarantees implementations, or at least update API docs

@horizonzy - when we create the checkpoint, we use `snapshot.allFilesScala` (see `Checkpoints.scala`). To generate `snapshot.allFilesScala`, we perform an in-memory-log-replay, where we keep track of the AddFiles seen so far using...

Improve deterministic guarantees implementations, or at least update API docs

@horizonzy - perhaps. But this is only one delta client. We don't know who wrote the previous json files or checkpoints. If another delta client wrote the previous checkpoint, and...

Improve deterministic guarantees implementations, or at least update API docs

Hi @horizonzy - we would need all clients to do this. Else, when you read a checkpoint, you wouldn't know which client wrote it and if it is sorted. Enforcing...

Improve deterministic guarantees implementations, or at least update API docs

@horizonzy can you partition your data? We provide `snapshot.scan(Expression)` APIs to let you partition prune.

Improve deterministic guarantees implementations, or at least update API docs

@horizonzy what if we added an API/config that sorted the data on read? also, what would be sort it by?

Delta Standalone: Stat serialization/deserialization inconsistent

Hi @gopik - sorry for the delay. I've confirmed this issue myself, too. Want to copy over https://github.com/delta-io/delta/blob/master/core/src/test/scala/org/apache/spark/sql/delta/ActionSerializerSuite.scala from delta-io/delta repository? And perhaps investigate the fix?

Scott Sandre

FlinkDeltaSink: Add column stats from the parquet file to delta transaction log

Improve deterministic guarantees implementations, or at least update API docs

Improve deterministic guarantees implementations, or at least update API docs

Improve deterministic guarantees implementations, or at least update API docs

Improve deterministic guarantees implementations, or at least update API docs

Improve deterministic guarantees implementations, or at least update API docs

Delta Standalone: Stat serialization/deserialization inconsistent

Delta Standalone: Stat serialization/deserialization inconsistent

Flink Delta Sink - Table API UPDATED

Flink Delta Sink - Table API UPDATED