[Flink] Arctic table supports real-time dimension table join
Background: There are some requirements for real-time data widening. Now hive supports lookup join, but this solution is not available for production, and the hive table needs to be loaded into memory. Large tables are prone to oom problems. Besides, neither Iceberg nor Hudi support lookup joins.
Here is a summary proposal: Flink affords the event time temporal join. The right table will be used as a version table, and its data can be managed in rocksdb instead of memory.
-- create a left table, using localtimestamp as event time.
create table source (
...,
arcitc_process_time AS LOCALTIMESTAMP,
WATERMARK FOR arcitc_process_time AS arcitc_process_time,
) with (...);
create table arctic_dim (...) with ('connector'='arctic', 'dim-table.enabled'='true');
select * from source as O left join arctic_dim FOR SYSTEM_TIME AS OF O.arcitc_process_time as P on O.id = P.id;
The arctic source will automatically create a custom watermark strategy if dim-table.enabled equals true.
Hello @zstraw could you share your design document here?
Here's a quick design doc. https://docs.google.com/document/d/1gYgZsHRHnlFr-fqPMv1xqv_wo5lc4_paMseGselGlUM/edit?usp=sharing