hive icon indicating copy to clipboard operation
hive copied to clipboard

HIVE-24167: TPC-DS query 14 fails while generating plan for the filter

Open okumin opened this issue 2 years ago • 5 comments

What changes were proposed in this pull request?

Make it possible for PlanMapper to link related Operators or RelNodes correctly.

For your reference, I made a Gist page explaining the behaviors of PlanMapper and alternative approaches in my mind. https://gist.github.com/okumin/b111fe0a911507bdf6a7204f49b9cb72

See also the PR where I implemented another option, "Combine RelNodes with RelTreeSignature".

Why are the changes needed?

No. 14 of TPC-DS or other complex queries with CTE materialized fail due to this bug. https://issues.apache.org/jira/browse/HIVE-24167

Does this PR introduce any user-facing change?

No. This is a bug fix.

Is the change a dependency upgrade?

No.

How was this patch tested?

I added a new test case, which fails with the current master branch, and restored No. 14 of TPC-DS.

okumin avatar Jan 25 '24 08:01 okumin

Quality Gate Passed Quality Gate passed

The SonarCloud Quality Gate passed, but some issues were introduced.

1 New issue
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

sonarqubecloud[bot] avatar Jan 29 '24 14:01 sonarqubecloud[bot]

just seen the details you linked to the PR.

I believe the link should be allowed - but in this the merge should be allowed due to the fact that these things represent the same thing. I see now why the signature become an obstacle - I believe the SWO was able to match the same plan-parts to only scan the table once; if you don't have link calls there to explain the situation -> that could cause equiv violations here...could you please check that?

kgyrtkirk avatar Feb 02 '24 11:02 kgyrtkirk

@kgyrtkirk

I believe the link should be allowed - but in this the merge should be allowed due to the fact that these things represent the same thing.

I guess this mentions the link on creating a FilterOperator?

I see now why the signature become an obstacle - I believe the SWO was able to match the same plan-parts to only scan the table once; if you don't have link calls there to explain the situation -> that could cause equiv violations here...could you please check that?

I guess the SWO stands for SharedWorkOptimizer. With CTE materialization, each CTE is compiled separately. It means materialized_cte, another_materialized_cte, and the body part are compiled one by one in a dedicated manner. So, SharedWorkOptimizer has no chance to unify TableScanOperators located in multiple CTEs.

Just for a note. We use CTE because we need to support some cases which SharedWorkOptimizer doesn't support. An example is complex CTE + UNION ALL. We may replace CTE materialization with SharedWorkOptimizer when it becomes more mature.

okumin avatar Feb 02 '24 15:02 okumin

each CTE is compiled separately

got it now - I was thinking its the other way around

I was thinking about this a bit in the last couple days; here is a summary of my thoughts:

  • to have this feature cause trouble the following is needed:
    • a failing query
    • an incorrectly taken stat entry
    • the bad stat entry causes further issues
    • ....not very likely :D
  • keeping multiple equiv groups which should be merged is not fortunate - but may not cause trouble
  • at the end of the day all a Map<Signature, RuntimeStat> will be created from all these details
    • if there is are 2 groups which have the same Signature that will mean there will be multiple candidates for the key in the map => which means those should have been in a single group...that's why I didn't really like to relax the check on the signature
      • I believe lookup-by-signatue will either return one a representative group ; even when there are more

from the listed approaches in the doc:

  • I liked the best the RelTreeSignature related; I think that would be the best if possioble
    • this might be a way to achieve a merge between the 2 groups
    • have you tried somehow relax the alias name of HiveTableScan in the RelTreeSignature ?
      • that class creates the signature of the relnodes 1-by-1 ; so you could access/customize/etc based on the type
        • possibly use the @Signature annotation
        • use something to filter out poisonous stuff @IgnoreSig or something
        • by-hand stuff
  • on the other hand after thinking it thru what might be the outcomes - I find that the idea to not pull the signatures during linking might actually work correctly...

note: you could reachable on the ASF slack as well - we can talk on #hive as well

kgyrtkirk avatar Feb 05 '24 10:02 kgyrtkirk

@kgyrtkirk I tried the option using RelTreeSignature in the following Pull Request. I still have some concerns, but it will likely work from my perspective. https://github.com/apache/hive/pull/5077

Could you please take a look and see if it seems feasible? If we prefer #5077 , I will void this PR and try to get it merged.

okumin avatar Feb 11 '24 13:02 okumin

Let me close this one because we will likely go with #5077 or another approach.

okumin avatar Apr 08 '24 12:04 okumin