hudi
hudi copied to clipboard
[HUDI-6563] Supports flink lookup join
Change Logs
Supports flink lookup join
can use `CREATE TABLE datagen_source( id int, name STRING, proctime as PROCTIME() ) WITH ( 'connector' = 'datagen', 'rows-per-second'='1', 'number-of-rows' = '2', 'fields.id.kind'='sequence', 'fields.id.start'='1', 'fields.id.end'='2' );
select o.id,o.name,b.id as id2 from datagen_source AS o join hudi_table/*+ OPTIONS('lookup.join.cache.ttl'= '2 day') */ FOR SYSTEM_TIME AS OF o.proctime AS b on o.id = b.id; `
This is basically the same as hive's lookup principle. Cache the hudi table data into the memory, set the ttl time, and read it with lookup
Impact
Supports flink lookup join
Risk level (write none, low medium or high below)
low
Contributor's checklist
- [X] Read through contributor's guide
- [X] Change Logs and Impact were stated clearly
- [X] Adequate tests were added if applicable
- [X] CI passed
@hudi-bot run azure
Thanks for the contribution @waywtdcc , can you explain in high level how the hudi table is loaded and what is the refresh strategy of the table ?
Thanks for the contribution @waywtdcc , can you explain in high level how the hudi table is loaded and what is the refresh strategy of the table ?
The FileSystemLookupFunction of flink is reused here. First time: load all data into each task memory Subsequent update: the data in the memory will be refreshed at regular intervals
@hudi-bot run azure
Thanks for the contribution @waywtdcc , can you explain in high level how the hudi table is loaded and what is the refresh strategy of the table ?
The FileSystemLookupFunction of flink is reused here. First time: load all data into each task memory Subsequent update: the data in the memory will be refreshed at regular intervals
You mean Flink itself would take care of the data fresh.
@danny0405 hello?
@waywtdcc Hi, can you rebase with the latest master and I will take a look of this PR.
CI report:
- 5f38864fefcf6a306d264de72b316eeeb3459cb6 Azure: SUCCESS
Bot commands
@hudi-bot supports the following commands:@hudi-bot run azurere-run the last Azure build