[flink] refactor the code of the lookup and support computing the changelog generated by compact during read time.
Purpose
Linked issue: close #3868
[flink] refactor the code of the lookup and support computing the changelog generated by compact during read time.
This is used when changelog producer is none, but CoreOptions#needLookup is true and the table is used as a dim table.
Tests
- org.apache.paimon.flink.lookup.LookupTableTest#testFullCacheLookupTableWithForceLookup
- org.apache.paimon.flink.lookup.LookupTableTest#testPartialLookupTableWithForceLookup
API and Format
No
Documentation
I commented on the issue, and I want you to really think about whether this change adds real value to the business. Is it just to save on generating the changelog? Is that really the main cost involved?
Currently, your scenario involves using a partial-update table for dimension tables:
- 90% of the scenarios are based on primary key lookup joins. The current implementation requires a lookup, which is why you introduced force-lookup.
- 10% of the scenarios are based on non-primary key lookup joins. The current implementation requires changelog, but you don't want every dimension table to generate changelog, so this PR aims to address that.
I got it, and I am OK with this.
Could you consider extracting the implementation from Core? This functionality is only intended for Flink Lookup Join.
@JingsongLi hi, please help review it when you have time.