datafusion
datafusion copied to clipboard
Add an example of embedding indexes inside a parquet file
Which issue does this PR close?
- Closes #16374
Rationale for this change
//! Example: embedding a "distinct values" index in a Parquet file's metadata
//!
//! 1. Read existing Parquet files
//! 2. Compute distinct values for a target column using DataFusion
//! 3. Serialize the distinct index to bytes and write to the new Parquet file
//! with these encoded bytes appended as a custom metadata entry
//! 4. Read each new parquet file, extract and deserialize the index from footer
//! 5. Use the distinct index to prune files when querying