hudi
hudi copied to clipboard
[WIP] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive
Change Logs
Replace existing hive read logic with filegroup reader
HoodieFileGroupReader is the generic implementation of a filegroup reader that is intended to be used by all engines. I created HoodieFileGroupReaderRecordReader which implements RecordReader. HoodieFileGroupReaderRecordReader uses HoodieFileGroupReader with HiveHoodieReaderContext to read filegroups (cow, mor, bootstrap) with the hive/hadoop engine.
Impact
hive will be more maintainable
Risk level (write none, low medium or high below)
high need to do lots of testing
Documentation Update
N/A
Contributor's checklist
- [ ] Read through contributor's guide
- [ ] Change Logs and Impact were stated clearly
- [ ] Adequate tests were added if applicable
- [ ] CI passed
@vinothchandar
@bvaradar Can you help the review of the hive related code?
@bvaradar Can you help the review of the hive related code?
Yes @danny0405 . Will review this PR.
Thanks @bvaradar, we all appreciate it!
Overall looks good to me. @jonvex : What Hive versions are we targeting/testing ?
@bvaradar I used the docker demo to test. I think that is using Hive 2. We would like this to replace the existing implementation so the goal is to support everything that works when fg reader is disabled.
CI report:
- 99517e23baa60a6a0602e9daf7f522f3c1dcfa1e UNKNOWN
- 89a4a8c0e3add3f192b7fe8fa01f0f5c7d3be71e UNKNOWN
- e95bcb80e4b729677ef65be41abc30e8c4ce5c03 Azure: SUCCESS
Bot commands
@hudi-bot supports the following commands:-
@hudi-bot run azure
re-run the last Azure build
@jonvex could you check in the code of testing new file group reader on Hive 3 from #11398?