peloton
peloton copied to clipboard
Duplicate groups generated in the optimizer
As it was reported by @GustavoAngulo , the assertion in https://github.com/cmu-db/peloton/blob/master/src/optimizer/memo.cpp#L46 is not correct because we may generate duplicate groups with different group ID in the optimizer. This is because now we do not generate all logically equivalent group at once. One possible scenario is we generate ((A join B) join C) join D
by one transformation rule and (A join B) join (C join D)
by another rule, currently there's no way to detect these two operator trees are equivalent so they'll be assigned different GroupID
, then we'll have duplicate groups. Later this assertion will fail due to two groups having the same operator tree generated by transformation rules.
@GustavoAngulo Could you provide the SQL trace that breaks the assertion?
You should be able to recreate it with any three way join, or possibly a two way join. I thin running planner_equality_test should cause it. I'm seeing though that the changes from #1245 never made it in, idk why that happened. In which case to trigger the failure you will need to make the fix here
I found that ORCA may be solving duplicated groups problem by merging them, which I think is the correct way to do it.
https://github.com/greenplum-db/gporca/blob/master/libgpopt/src/search/CMemo.cpp#L605
hhttps://github.com/greenplum-db/gporca/blob/master/libgpopt/src/search/CGroup.cpp#L1010
Interesting. Do you know how Columbia did it? Or they actually didn't handle it...
On Tue, May 22, 2018 at 9:54 AM, Bowei Chen [email protected] wrote:
I found that ORCA may be solving duplicated groups problem by merging them, which I think is the correct way to do it.
https://github.com/greenplum-db/gporca/blob/master/ libgpopt/src/search/CMemo.cpp#L605
https://github.com/greenplum-db/gporca/blob/master/ libgpopt/src/search/CMemo.cpp#L605 https://github.com/greenplum-db/gporca/blob/master/libgpopt/src/search/CGroup.cpp#L1010
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cmu-db/peloton/issues/1354#issuecomment-391063607, or mute the thread https://github.com/notifications/unsubscribe-auth/AFojzae-9s51_ley3FIlm027-iqNcDK7ks5t1EK8gaJpZM4T5Ww3 .
I didn't see them handle this problem in their codebase, at least they're not merging groups.
@malin1993ml Do you think it's OK to do it in the similar way as ORCA does it?
I think it’s okay. There’re trade-offs with it, but I don’t have a better approach right now. I think it’s reasonable to proceed with that way. On Tue, May 22, 2018 at 16:28 Bowei Chen [email protected] wrote:
I didn't see them handle this problem in their codebase, at least they're not merging groups.
@malin1993ml https://github.com/malin1993ml Do you think it's OK to do it in the similar way as ORCA does it?
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/cmu-db/peloton/issues/1354#issuecomment-391172905, or mute the thread https://github.com/notifications/unsubscribe-auth/AFojzbyS2YozIyfl3vHBeGZfXLmC3Uzcks5t1J8wgaJpZM4T5Ww3 .