hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[HUDI-6563] Supports flink lookup join

Open waywtdcc opened this issue 2 years ago • 8 comments

Change Logs

Supports flink lookup join

can use `CREATE TABLE datagen_source( id int, name STRING, proctime as PROCTIME() ) WITH ( 'connector' = 'datagen', 'rows-per-second'='1', 'number-of-rows' = '2', 'fields.id.kind'='sequence', 'fields.id.start'='1', 'fields.id.end'='2' );

select o.id,o.name,b.id as id2 from datagen_source AS o join hudi_table/*+ OPTIONS('lookup.join.cache.ttl'= '2 day') */ FOR SYSTEM_TIME AS OF o.proctime AS b on o.id = b.id; `

This is basically the same as hive's lookup principle. Cache the hudi table data into the memory, set the ttl time, and read it with lookup

Impact

Supports flink lookup join

Risk level (write none, low medium or high below)

low

Contributor's checklist

  • [X] Read through contributor's guide
  • [X] Change Logs and Impact were stated clearly
  • [X] Adequate tests were added if applicable
  • [X] CI passed

waywtdcc avatar Jul 19 '23 08:07 waywtdcc

@hudi-bot run azure

waywtdcc avatar Jul 21 '23 06:07 waywtdcc

Thanks for the contribution @waywtdcc , can you explain in high level how the hudi table is loaded and what is the refresh strategy of the table ?

danny0405 avatar Jul 28 '23 01:07 danny0405

Thanks for the contribution @waywtdcc , can you explain in high level how the hudi table is loaded and what is the refresh strategy of the table ?

The FileSystemLookupFunction of flink is reused here. First time: load all data into each task memory Subsequent update: the data in the memory will be refreshed at regular intervals

waywtdcc avatar Jul 31 '23 08:07 waywtdcc

@hudi-bot run azure

waywtdcc avatar Jul 31 '23 08:07 waywtdcc

Thanks for the contribution @waywtdcc , can you explain in high level how the hudi table is loaded and what is the refresh strategy of the table ?

The FileSystemLookupFunction of flink is reused here. First time: load all data into each task memory Subsequent update: the data in the memory will be refreshed at regular intervals

You mean Flink itself would take care of the data fresh.

danny0405 avatar Jul 31 '23 11:07 danny0405

@danny0405 hello?

waywtdcc avatar Aug 01 '23 01:08 waywtdcc

@waywtdcc Hi, can you rebase with the latest master and I will take a look of this PR.

danny0405 avatar May 13 '24 03:05 danny0405

CI report:

  • 5f38864fefcf6a306d264de72b316eeeb3459cb6 Azure: SUCCESS
Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

hudi-bot avatar May 15 '24 13:05 hudi-bot