hudi
hudi copied to clipboard
[HUDI-7523] Add HOODIE_SPARK_DATASOURCE_OPTIONS to be used in HoodieIncrSource
Change Logs
Add a new config HOODIE_SPARK_DATASOURCE_OPTIONS
which is used by the spark dataframe reader for HoodieIncrSource, options like using metadataTable, dataSkipping present inDataSourceOptions.scala can be passed for efficient pruning of files.
https://github.com/apache/hudi/blob/master/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala
Impact
The files will be pruned using colstats and other mechanisms available making HoodieIncrSource
more efficient.
Risk level (write none, low medium or high below)
Low
Documentation Update
HOODIE_SPARK_DATASOURCE_OPTIONS is the new config being added. A comma separate list of options that can be passed to the spark dataframe reader of a hudi table, eg: hoodie.metadata.enable=true,hoodie.enable.data.skipping=true.
Contributor's checklist
- [x] Read through contributor's guide
- [x] Change Logs and Impact were stated clearly
- [x] Adequate tests were added if applicable
- [ ] CI passed
hey @vinishjail97 : can you address the reviews from sagar.
CI report:
- b91da909a18c11702b917910846356e98aeaecf2 Azure: SUCCESS
Bot commands
@hudi-bot supports the following commands:-
@hudi-bot run azure
re-run the last Azure build