HIVE-24167: TPC-DS query 14 fails while generating plan for the filter
What changes were proposed in this pull request?
Make it possible for PlanMapper to link related Operators or RelNodes correctly.
For your reference, I made a Gist page explaining the behaviors of PlanMapper and alternative approaches in my mind. https://gist.github.com/okumin/b111fe0a911507bdf6a7204f49b9cb72
See also the PR where I implemented another option, "Combine RelNodes with RelTreeSignature".
Why are the changes needed?
No. 14 of TPC-DS or other complex queries with CTE materialized fail due to this bug. https://issues.apache.org/jira/browse/HIVE-24167
Does this PR introduce any user-facing change?
No. This is a bug fix.
Is the change a dependency upgrade?
No.
How was this patch tested?
I added a new test case, which fails with the current master branch, and restored No. 14 of TPC-DS.
Quality Gate passed
The SonarCloud Quality Gate passed, but some issues were introduced.
1 New issue
0 Security Hotspots
No data about Coverage
No data about Duplication
just seen the details you linked to the PR.
I believe the link should be allowed - but in this the merge should be allowed due to the fact that these things represent the same thing.
I see now why the signature become an obstacle - I believe the SWO was able to match the same plan-parts to only scan the table once; if you don't have link calls there to explain the situation -> that could cause equiv violations here...could you please check that?
@kgyrtkirk
I believe the link should be allowed - but in this the merge should be allowed due to the fact that these things represent the same thing.
I guess this mentions the link on creating a FilterOperator?
I see now why the signature become an obstacle - I believe the SWO was able to match the same plan-parts to only scan the table once; if you don't have link calls there to explain the situation -> that could cause equiv violations here...could you please check that?
I guess the SWO stands for SharedWorkOptimizer. With CTE materialization, each CTE is compiled separately. It means materialized_cte, another_materialized_cte, and the body part are compiled one by one in a dedicated manner. So, SharedWorkOptimizer has no chance to unify TableScanOperators located in multiple CTEs.
Just for a note. We use CTE because we need to support some cases which SharedWorkOptimizer doesn't support. An example is complex CTE + UNION ALL. We may replace CTE materialization with SharedWorkOptimizer when it becomes more mature.
each CTE is compiled separately
got it now - I was thinking its the other way around
I was thinking about this a bit in the last couple days; here is a summary of my thoughts:
- to have this feature cause trouble the following is needed:
- a failing query
- an incorrectly taken stat entry
- the bad stat entry causes further issues
- ....not very likely :D
- keeping multiple equiv groups which should be merged is not fortunate - but may not cause trouble
- at the end of the day all a
Map<Signature, RuntimeStat>will be created from all these details- if there is are 2 groups which have the same
Signaturethat will mean there will be multiple candidates for the key in the map => which means those should have been in a single group...that's why I didn't really like to relax the check on the signature- I believe lookup-by-signatue will either return one a representative group ; even when there are more
- if there is are 2 groups which have the same
from the listed approaches in the doc:
- I liked the best the
RelTreeSignaturerelated; I think that would be the best if possioble- this might be a way to achieve a merge between the 2 groups
- have you tried somehow relax the alias name of
HiveTableScanin theRelTreeSignature?- that class creates the signature of the relnodes 1-by-1 ; so you could access/customize/etc based on the type
- possibly use the
@Signatureannotation - use something to filter out poisonous stuff
@IgnoreSigor something - by-hand stuff
- possibly use the
- that class creates the signature of the relnodes 1-by-1 ; so you could access/customize/etc based on the type
- on the other hand after thinking it thru what might be the outcomes - I find that the idea to not pull the signatures during linking might actually work correctly...
note: you could reachable on the ASF slack as well - we can talk on #hive as well
@kgyrtkirk
I tried the option using RelTreeSignature in the following Pull Request. I still have some concerns, but it will likely work from my perspective.
https://github.com/apache/hive/pull/5077
Could you please take a look and see if it seems feasible? If we prefer #5077 , I will void this PR and try to get it merged.
Let me close this one because we will likely go with #5077 or another approach.