hudi
hudi copied to clipboard
[HUDI-7261] TVF to query hudi table's filesystem state through spark-sql
Depends on [HUDI-7243]
A new TVF, hudi_filesystem_view(...)
is added to support querying timeline through spark-sql. The information displayed is influenced by the 'fsview' command of hudi-cli
A new relation, FileSystemRelation
, is added to transparently support this functionality. The relation implements buildScan(...) method of TableScan trait. It does not support filter or predicate push-down. Column filtering and predicate evaluation needs to be done by the execution layer. This seems reasonable for the initial implementation for this tool which is mainly going to be used as a debugging/introspection tool. The relation defines a fixed schema required to display basic file information of a given hudi table
Change Logs
A new TVF, hudi_filesystem_view(...)
is added to support querying timeline through spark-sql. The information displayed is influenced by the 'fsview' command of hudi-cli
A new relation, FileSystemRelation
, is added to transparently support this functionality. The relation implements buildScan(...) method of TableScan trait. It does not support filter or predicate push-down. Column filtering and predicate evaluation needs to be done by the execution layer. This seems reasonable for the initial implementation for this tool which is mainly going to be used as a debugging/introspection tool. The relation defines a fixed schema required to display basic file information of a given hudi table
Impact
New TVF function is added to introspect fileystem state for a given hudi table through spark-sql
Risk level (write none, low medium or high below)
Low
Documentation Update
TBD
Contributor's checklist
- [ ] Read through contributor's guide
- [ ] Change Logs and Impact were stated clearly
- [ ] Adequate tests were added if applicable
- [ ] CI passed
@bhat-vinay : Landed the other PR. Please resolve conflicts and rebase
Thanks for the review @bvaradar. @codope pointed that the failing tests could be fixed by https://github.com/apache/hudi/pull/10381. Rebased past it to see if I can get a clean run.
CI report:
- c64e1e3a9816b278606ee32aede728ffb928708c Azure: SUCCESS
Bot commands
@hudi-bot supports the following commands:-
@hudi-bot run azure
re-run the last Azure build