[Bug] lookup with retry is very slow if there are equality in a where clause after the join
Search before asking
- [x] I searched in the issues and found nothing similar.
Paimon version
1.20
Compute Engine
Flink
Minimal reproduce step
I am having an issue with a WHERE condition on a lookup join. Taking the customer and orders example from the documentation, if I want to filter the results of the join by the country=US, it doesn't work. Because it seems to optimizes the SQL by moving the c.country = 'US' clause from the where condition to the join condition and do a lookup join based on both customer id and country. But since many customers are not in US, this lookup fails, and it keeps on retrying.
SELECT /*+ LOOKUP('table'='c', 'retry-predicate'='lookup_miss', 'retry-strategy'='fixed_delay', 'fixed-delay'='1s', 'max-attempts'='600') */
o.order_id, o.total, c.country, c.zip
FROM orders AS o
JOIN customers
FOR SYSTEM_TIME AS OF o.proc_time AS c
ON o.customer_id = c.id
WHERE c.country = 'US'
;
I have found a simple workaround , by adding an OR condition to the where clause i.e WHERE c.country = 'US' or c.country = 'dummy' . The OR part of the condition never matches, but adding this prevents the SQL optimization from being used, and the lookup only uses customer id.
What doesn't meet your expectations?
I would expect only columns in the ON condition to be used for the Lookup join, not columns in the WHERE condition.
Anything else?
No response
Are you willing to submit a PR?
- [x] I'm willing to submit a PR!