hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[WIP] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive

Open jonvex opened this issue 1 year ago • 6 comments

Change Logs

Replace existing hive read logic with filegroup reader

HoodieFileGroupReader is the generic implementation of a filegroup reader that is intended to be used by all engines. I created HoodieFileGroupReaderRecordReader which implements RecordReader. HoodieFileGroupReaderRecordReader uses HoodieFileGroupReader with HiveHoodieReaderContext to read filegroups (cow, mor, bootstrap) with the hive/hadoop engine.

Impact

hive will be more maintainable

Risk level (write none, low medium or high below)

high need to do lots of testing

Documentation Update

N/A

Contributor's checklist

  • [ ] Read through contributor's guide
  • [ ] Change Logs and Impact were stated clearly
  • [ ] Adequate tests were added if applicable
  • [ ] CI passed

jonvex avatar Dec 28 '23 01:12 jonvex

@vinothchandar

jonvex avatar Jan 04 '24 03:01 jonvex

@bvaradar Can you help the review of the hive related code?

danny0405 avatar Feb 03 '24 02:02 danny0405

@bvaradar Can you help the review of the hive related code?

Yes @danny0405 . Will review this PR.

bvaradar avatar Feb 07 '24 06:02 bvaradar

Thanks @bvaradar, we all appreciate it!

jonvex avatar Feb 07 '24 17:02 jonvex

Overall looks good to me. @jonvex : What Hive versions are we targeting/testing ?

@bvaradar I used the docker demo to test. I think that is using Hive 2. We would like this to replace the existing implementation so the goal is to support everything that works when fg reader is disabled.

jonvex avatar Feb 20 '24 15:02 jonvex

CI report:

  • 99517e23baa60a6a0602e9daf7f522f3c1dcfa1e UNKNOWN
  • 89a4a8c0e3add3f192b7fe8fa01f0f5c7d3be71e UNKNOWN
  • e95bcb80e4b729677ef65be41abc30e8c4ce5c03 Azure: SUCCESS
Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

hudi-bot avatar Jun 09 '24 00:06 hudi-bot

@jonvex could you check in the code of testing new file group reader on Hive 3 from #11398?

yihua avatar Jun 23 '24 20:06 yihua