hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[HUDI-8746] Support multi base file formats through fg reader

Open linliu-code opened this issue 10 months ago • 1 comments

Change Logs

Created a new file format HoodieFileGroupReaderBasedFileFormat to support multiple base file formats, which is guarded by the existing configuration. The main change that we do is to

  1. create a class SparkOrcReaders for Spark3.5, Spark3.4 and Spark3.3.
  2. use a map to contain both orc and parquet readers and pass it to Spark file group reader.

The orc file reader implementation is based on Spark orc reader implemenation.

Impact

Support multiple base file format: orc and parquet for Spark3.3, Spark3.4 and Spark3.5

Risk level (write none, low medium or high below)

Medium.

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the instruction to make changes to the website.

Contributor's checklist

  • [ ] Read through contributor's guide
  • [ ] Change Logs and Impact were stated clearly
  • [ ] Adequate tests were added if applicable
  • [ ] CI passed

linliu-code avatar Jan 12 '25 22:01 linliu-code

CI report:

  • 4a9523fb3aff19bb346aa96d411ddab814e262ee UNKNOWN
  • e8bc010fa97e5efc72352dd5f7e5e4cfe0f5dd10 UNKNOWN
  • a0b4702a2ca470bd4371ee21f56567400f182c00 UNKNOWN
  • 083065989fa8b7e9ba232c8c3f41bc726e769997 UNKNOWN
  • 72e95de4b94d13b2d3f587e6185c4bbd8a5c1214 UNKNOWN
  • 0cbdb636776036d8bdc9185b9381ba9d8084d01d UNKNOWN
  • 27996fcf1826b7de526ea2438bf491692a6f179c UNKNOWN
  • 6a4bc9da7f31d162b32cd1d10ff6c1f69e5b16d6 UNKNOWN
  • 65676d07105dcdae6c15fdad6bcd70e9ee7ad8e7 Azure: FAILURE
Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

hudi-bot avatar May 09 '25 16:05 hudi-bot