hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[HUDI-7523] Add HOODIE_SPARK_DATASOURCE_OPTIONS to be used in HoodieIncrSource

Open vinishjail97 opened this issue 5 months ago • 1 comments

Change Logs

Add a new config HOODIE_SPARK_DATASOURCE_OPTIONS which is used by the spark dataframe reader for HoodieIncrSource, options like using metadataTable, dataSkipping present inDataSourceOptions.scala can be passed for efficient pruning of files.

https://github.com/apache/hudi/blob/master/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala

Impact

The files will be pruned using colstats and other mechanisms available making HoodieIncrSource more efficient.

Risk level (write none, low medium or high below)

Low

Documentation Update

HOODIE_SPARK_DATASOURCE_OPTIONS is the new config being added. A comma separate list of options that can be passed to the spark dataframe reader of a hudi table, eg: hoodie.metadata.enable=true,hoodie.enable.data.skipping=true.

Contributor's checklist

  • [x] Read through contributor's guide
  • [x] Change Logs and Impact were stated clearly
  • [x] Adequate tests were added if applicable
  • [ ] CI passed

vinishjail97 avatar Mar 21 '24 06:03 vinishjail97