peloton icon indicating copy to clipboard operation
peloton copied to clipboard

Duplicate groups generated in the optimizer

Open chenboy opened this issue 6 years ago • 5 comments

As it was reported by @GustavoAngulo , the assertion in https://github.com/cmu-db/peloton/blob/master/src/optimizer/memo.cpp#L46 is not correct because we may generate duplicate groups with different group ID in the optimizer. This is because now we do not generate all logically equivalent group at once. One possible scenario is we generate ((A join B) join C) join D by one transformation rule and (A join B) join (C join D) by another rule, currently there's no way to detect these two operator trees are equivalent so they'll be assigned different GroupID, then we'll have duplicate groups. Later this assertion will fail due to two groups having the same operator tree generated by transformation rules.

@GustavoAngulo Could you provide the SQL trace that breaks the assertion?

chenboy avatar May 10 '18 04:05 chenboy

You should be able to recreate it with any three way join, or possibly a two way join. I thin running planner_equality_test should cause it. I'm seeing though that the changes from #1245 never made it in, idk why that happened. In which case to trigger the failure you will need to make the fix here

GustavoAngulo avatar May 10 '18 15:05 GustavoAngulo

I found that ORCA may be solving duplicated groups problem by merging them, which I think is the correct way to do it.

https://github.com/greenplum-db/gporca/blob/master/libgpopt/src/search/CMemo.cpp#L605

hhttps://github.com/greenplum-db/gporca/blob/master/libgpopt/src/search/CGroup.cpp#L1010

chenboy avatar May 22 '18 16:05 chenboy

Interesting. Do you know how Columbia did it? Or they actually didn't handle it...

On Tue, May 22, 2018 at 9:54 AM, Bowei Chen [email protected] wrote:

I found that ORCA may be solving duplicated groups problem by merging them, which I think is the correct way to do it.

https://github.com/greenplum-db/gporca/blob/master/ libgpopt/src/search/CMemo.cpp#L605

https://github.com/greenplum-db/gporca/blob/master/ libgpopt/src/search/CMemo.cpp#L605 https://github.com/greenplum-db/gporca/blob/master/libgpopt/src/search/CGroup.cpp#L1010

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cmu-db/peloton/issues/1354#issuecomment-391063607, or mute the thread https://github.com/notifications/unsubscribe-auth/AFojzae-9s51_ley3FIlm027-iqNcDK7ks5t1EK8gaJpZM4T5Ww3 .

linmagit avatar May 22 '18 20:05 linmagit

I didn't see them handle this problem in their codebase, at least they're not merging groups.

@malin1993ml Do you think it's OK to do it in the similar way as ORCA does it?

chenboy avatar May 22 '18 23:05 chenboy

I think it’s okay. There’re trade-offs with it, but I don’t have a better approach right now. I think it’s reasonable to proceed with that way. On Tue, May 22, 2018 at 16:28 Bowei Chen [email protected] wrote:

I didn't see them handle this problem in their codebase, at least they're not merging groups.

@malin1993ml https://github.com/malin1993ml Do you think it's OK to do it in the similar way as ORCA does it?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/cmu-db/peloton/issues/1354#issuecomment-391172905, or mute the thread https://github.com/notifications/unsubscribe-auth/AFojzbyS2YozIyfl3vHBeGZfXLmC3Uzcks5t1J8wgaJpZM4T5Ww3 .

linmagit avatar May 23 '18 04:05 linmagit