hudi
hudi copied to clipboard
[HUDI-8196] Support pruning based on partition stats index in Hudi Flink
Change Logs
This PR introduces a new partition pruner for Flink source based on the Partition Stats Index.
Before this PR, Flink source (batch or streaming) uses partition filters pushed down to build partition pruner and filter irrelevant partitions. Then, Column Stats Index is used to build data pruner to do the file-level data skipping. HUDI-7144 introduced the partition-level column stats, we can use the stats to prune partitions just like the way files are pruned.
Main changes:
- Add utilities to fetch Partition Stats Index data for Flink source.
- Add a new partition pruner
ColumnStatsPartitionPruner. - Add new config
read.partition.data.skipping.enabledto enabled pruning based on partition stats,falseby default.
Impact
Enhance the data skipping ability for Flink source by introducing a new partition pruner based on Partition Stats Index.
Risk level (write none, low medium or high below)
low
Documentation Update
none
Contributor's checklist
- [ ] Read through contributor's guide
- [ ] Change Logs and Impact were stated clearly
- [ ] Adequate tests were added if applicable
- [ ] CI passed
@danny0405 please take another look, thanks.
CI report:
- efde4491a94d517528c04c94449be0aee37d262e Azure: SUCCESS
Bot commands
@hudi-bot supports the following commands:@hudi-bot run azurere-run the last Azure build