Ryan Blue
Ryan Blue
Metadata currently tracks all paths using the full path. This is costly when not using compression (like the current metadata files) and doesn't allow a table to be easily relocated....
The SparkOrcWriter defines converters that could easily throw exceptions when a required column has a null value, like this: ```java static class RequiredIntConverter implements Converter { public void addValue(int rowId,...
Distinct counts aren't very valuable to cost-based optimization because they can't be easily merged. They should be removed. As a replacement, look into storing HLL buffers if they aren't too...
This PR adds on #447 and updates the properties to use "row group" instead of "block" because block is confusing. It also fixes the outstanding review comments so this can...
This is a minor fix. When opening a HadoopInputFile, use its configuration for read options.
This adds an evaluator, `ParquetIndexPageFilter.EvalVisitor` to evaluate an Iceberg Expression against Parquet's page indexes. That produces Parquet's `RowRanges`, which track ranges of rows that should be read from the Parquet...
Moving to the planned reader adds default value support. This is the same basic change as in #9366 and #11108.
This adds a blob type to the Puffin spec that can store a Roaring bitmap delete vector. This is in support of the row-level delete improvements proposed for Iceberg v3.
This is based on the changes to the Puffin spec in #11238.
### Rationale for this change Updating the Variant and shredding specs from a thorough review. ### What changes are included in this PR? Spec updates, mostly to the shredding spec...