Ryan Blue

Results 10 issues of Ryan Blue

Metadata currently tracks all paths using the full path. This is costly when not using compression (like the current metadata files) and doesn't allow a table to be easily relocated....

The SparkOrcWriter defines converters that could easily throw exceptions when a required column has a null value, like this: ```java static class RequiredIntConverter implements Converter { public void addValue(int rowId,...

Distinct counts aren't very valuable to cost-based optimization because they can't be easily merged. They should be removed. As a replacement, look into storing HLL buffers if they aren't too...

This PR adds on #447 and updates the properties to use "row group" instead of "block" because block is confusing. It also fixes the outstanding review comments so this can...

This is a minor fix. When opening a HadoopInputFile, use its configuration for read options.

This adds an evaluator, `ParquetIndexPageFilter.EvalVisitor` to evaluate an Iceberg Expression against Parquet's page indexes. That produces Parquet's `RowRanges`, which track ranges of rows that should be read from the Parquet...

parquet
stale

Moving to the planned reader adds default value support. This is the same basic change as in #9366 and #11108.

spark
core

This adds a blob type to the Puffin spec that can store a Roaring bitmap delete vector. This is in support of the row-level delete improvements proposed for Iceberg v3.

Specification

This is based on the changes to the Puffin spec in #11238.

Specification

### Rationale for this change Updating the Variant and shredding specs from a thorough review. ### What changes are included in this PR? Spec updates, mostly to the shredding spec...