Ryan Blue issues

Results 10 issues of


                                            Ryan Blue

Add support for relative paths in table metadata

Metadata currently tracks all paths using the full path. This is costly when not using compression (like the current metadata files) and doesn't allow a table to be easily relocated....

ORC: validate values are not null for required columns

The SparkOrcWriter defines converters that could easily throw exceptions when a required column has a null value, like this: ```java static class RequiredIntConverter implements Converter { public void addValue(int rowId,...

Prototype HLL buffers in manifest files to provide column distinct estimates.

Distinct counts aren't very valuable to cost-based optimization because they can't be easily merged. They should be removed. As a replacement, look into storing HLL buffers if they aren't too...

PARQUET-869: Configurable record counts for block size checks

This PR adds on #447 and updates the properties to use "row group" instead of "block" because block is confusing. It also fixes the outstanding review comments so this can...

PARQUET-1624: Use Hadoop options for HadoopInputFiles.

This is a minor fix. When opening a HadoopInputFile, use its configuration for read options.

Parquet: Add page filter using page indexes

This adds an evaluator, `ParquetIndexPageFilter.EvalVisitor` to evaluate an Iceberg Expression against Parquet's page indexes. That produces Parquet's `RowRanges`, which track ranges of rows that should be read from the Parquet...

parquet

stale

Spark 3.5: Update Spark to use planned Avro reads

Moving to the planned reader adds default value support. This is the same basic change as in #9366 and #11108.

spark

core

Ryan Blue