hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[HUDI-7261] TVF to query hudi table's filesystem state through spark-sql

Open bhat-vinay opened this issue 1 year ago • 1 comments

Depends on [HUDI-7243]

A new TVF, hudi_filesystem_view(...) is added to support querying timeline through spark-sql. The information displayed is influenced by the 'fsview' command of hudi-cli

A new relation, FileSystemRelation, is added to transparently support this functionality. The relation implements buildScan(...) method of TableScan trait. It does not support filter or predicate push-down. Column filtering and predicate evaluation needs to be done by the execution layer. This seems reasonable for the initial implementation for this tool which is mainly going to be used as a debugging/introspection tool. The relation defines a fixed schema required to display basic file information of a given hudi table

Change Logs

A new TVF, hudi_filesystem_view(...) is added to support querying timeline through spark-sql. The information displayed is influenced by the 'fsview' command of hudi-cli

A new relation, FileSystemRelation, is added to transparently support this functionality. The relation implements buildScan(...) method of TableScan trait. It does not support filter or predicate push-down. Column filtering and predicate evaluation needs to be done by the execution layer. This seems reasonable for the initial implementation for this tool which is mainly going to be used as a debugging/introspection tool. The relation defines a fixed schema required to display basic file information of a given hudi table

Impact

New TVF function is added to introspect fileystem state for a given hudi table through spark-sql

Risk level (write none, low medium or high below)

Low

Documentation Update

TBD

Contributor's checklist

  • [ ] Read through contributor's guide
  • [ ] Change Logs and Impact were stated clearly
  • [ ] Adequate tests were added if applicable
  • [ ] CI passed

bhat-vinay avatar Dec 26 '23 16:12 bhat-vinay

@bhat-vinay : Landed the other PR. Please resolve conflicts and rebase

bvaradar avatar Dec 29 '23 21:12 bvaradar

Thanks for the review @bvaradar. @codope pointed that the failing tests could be fixed by https://github.com/apache/hudi/pull/10381. Rebased past it to see if I can get a clean run.

bhat-vinay avatar Jan 03 '24 03:01 bhat-vinay

CI report:

  • c64e1e3a9816b278606ee32aede728ffb928708c Azure: SUCCESS
Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

hudi-bot avatar Jan 03 '24 05:01 hudi-bot