datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Add an example of embedding indexes inside a parquet file

Open zhuqi-lucas opened this issue 7 months ago • 0 comments

Which issue does this PR close?

  • Closes #16374

Rationale for this change

//! Example: embedding a "distinct values" index in a Parquet file's metadata
//!
//! 1. Read existing Parquet files
//! 2. Compute distinct values for a target column using DataFusion
//! 3. Serialize the distinct index to bytes and write to the new Parquet file
//!    with these encoded bytes appended as a custom metadata entry
//! 4. Read each new parquet file, extract and deserialize the index from footer
//! 5. Use the distinct index to prune files when querying

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

zhuqi-lucas avatar Jun 13 '25 08:06 zhuqi-lucas