paimon icon indicating copy to clipboard operation
paimon copied to clipboard

[flink] refactor the code of the lookup and support computing the changelog generated by compact during read time.

Open liming30 opened this issue 1 year ago • 4 comments

Purpose

Linked issue: close #3868

[flink] refactor the code of the lookup and support computing the changelog generated by compact during read time.

This is used when changelog producer is none, but CoreOptions#needLookup is true and the table is used as a dim table.

Tests

  • org.apache.paimon.flink.lookup.LookupTableTest#testFullCacheLookupTableWithForceLookup
  • org.apache.paimon.flink.lookup.LookupTableTest#testPartialLookupTableWithForceLookup

API and Format

No

Documentation

liming30 avatar Sep 02 '24 07:09 liming30

I commented on the issue, and I want you to really think about whether this change adds real value to the business. Is it just to save on generating the changelog? Is that really the main cost involved?

JingsongLi avatar Sep 03 '24 07:09 JingsongLi

Currently, your scenario involves using a partial-update table for dimension tables:

  1. 90% of the scenarios are based on primary key lookup joins. The current implementation requires a lookup, which is why you introduced force-lookup.
  2. 10% of the scenarios are based on non-primary key lookup joins. The current implementation requires changelog, but you don't want every dimension table to generate changelog, so this PR aims to address that.

I got it, and I am OK with this.

JingsongLi avatar Sep 04 '24 09:09 JingsongLi

Could you consider extracting the implementation from Core? This functionality is only intended for Flink Lookup Join.

JingsongLi avatar Sep 04 '24 09:09 JingsongLi

@JingsongLi hi, please help review it when you have time.

liming30 avatar Sep 11 '24 12:09 liming30