Szehon Ho

Results 9 issues of Szehon Ho

Expire snapshots can take a long time for large tables with millions of files and thousands of snapshots/manifests. One cause is the calculation of files to be deleted. The current...

spark
core

Closes #4362 This adds following columns to all files tables: - column_sizes_metrics - value_counts_metrics - null_value_counts_metrics - nan_value_counts_metrics - lower_bounds_metrics - upper_bounds_metrics This is to keep backward compatibility as the...

spark
core

This exposes position deletes as a metadata table "position_deletes", with schema: file, pos, row, partition This will be useful when trying to implement "RewritePositionDeleteFiles", as we will read positional deletes...

spark
arrow
core
data
build

### Why are the changes needed? Support SPJ one-side shuffle if other side has partition transform expression ### How was this patch tested? New unit test in KeyGroupedPartitioningSuite ### Was...

SQL
CORE

### What changes were proposed in this pull request? This is the final planned SPJ scenario: auto-shuffle one side + less join keys than partition keys. Background: - Auto-shuffle works...

SQL

### What changes were proposed in this pull request? in Spark SQL, add 'WITH OPTIONS' syntax to support dynamic relation options. This is a continuation of https://github.com/apache/spark/pull/41683 based on @cloud-fan's...

SQL

In recent releases of Iceberg-Spark, there have been changes to Spark defaults for distribution mode, by changes like: https://github.com/apache/iceberg/pull/7637 (and previous ones). I have seen many questions regarding why an...

docs

### What changes were proposed in this pull request? Add Update support in DataFrameWriterV2 ### Why are the changes needed? Spark currently supports update sql statement. We want DataFrame to...

SQL
CONNECT

This is the spec change for https://github.com/apache/iceberg/issues/10260. Also this is based closely on the decisions taken in the Parquet proposal for the same : https://github.com/apache/parquet-format/pull/240

spark
Specification