gpdb icon indicating copy to clipboard operation
gpdb copied to clipboard

Support scalar subquery with multiple columns

Open pobbatihari opened this issue 1 year ago • 1 comments

Description: This PR introduces support for scalar subqueries(BOOLEXPR) with multiple columns in Greenplum, enhancing the query processing capabilities of the system. Which allow for more complex and versatile SQL queries by enabling the use of subqueries that return multiple columns.

Key Features:

  • Implemented parsing, processing and exploration logic for scalar subqueries with multiple columns.
  • Ensured compatibility with existing Greenplum query features and optimizations, maintaining performance and reliability.
  • Included comprehensive test cases to validate the functionality and correctness of scalar subquery with multiple columns.
  • This enhancement expands Greenplum's capabilities, enabling users to write more expressive and powerful SQL queries for a wider range of use cases.

Limitations:

  • This PR supports scalar subqueries of the sublink type (BOOLEXPR) exclusively.
  • scalar correlated subqueries with multiple columns are not supported (fallback to planner) due to the lack of maintenance of the original column order specified in the query across optimization stages. This can lead to the generation of incorrect subplans.
  • HashLASJ plans are not generated for scalar subqueries with multiple oclumns of type NOT IN because observed HashLASJ plans yielding incorrect results with NULL tuples. Hence, HashLASJ is blocked for such queries, and a JIRA ticket has been created to address this issue before enabling HashLASJ for scalar subqueries multiple columns.
  • Generation of large filter plans for scalar subqueries with multiple columns, which done in XformUtils:SubqueryAnyToAgg() and XformUtils:SubqueryAllToJoin(), has been stopped. This decision was made because it becomes challenging to generate/evaluate large filters when the number of columns increases. As discussed, the generation of these plans has been halted for now.

Testcoverage:

  • Test cases for the subquery type NOT IN have not been added as most of the scenarios are already covered in the existing test suites, such as notin.sql and qp_subquery.sql
  • Included regression and MDP test cases for scalar subqueries with multiple columns of types ALL, ANY, and IN

pobbatihari avatar Apr 08 '24 11:04 pobbatihari