noisepage icon indicating copy to clipboard operation
noisepage copied to clipboard

TPC-H : Nested query with an exists or not exists in a where clause does not work

Open gautam20197 opened this issue 4 years ago • 1 comments

Feature Request

Summary

A query with a nested query in the where clause being used in an exists or not exists does not work.

Environment

I don't think it's an environment issue. But I'm running it on WSL2 on windows with Ubuntu 20.04.

Steps to Reproduce

  1. Compile with the standard args for RELEASE mentioned in the readme.
  2. Run noisepage executable.
  3. Connect to the terrier terminal by executing : psql -h localhost -p 15721 -d noisepage
  4. Create the tables from "https://github.com/oltpbenchmark/oltpbench/blob/master/src/com/oltpbenchmark/benchmarks/tpch/ddls/tpch-postgres-ddl.sql" . A .sql file can be easily run using \i tpch-postgres-ddl.sql; in the noisepage terminal.
  5. Run the query
select
        o_orderpriority,
        count(*) as order_count
from
        orders
where
        o_orderdate >= '1993-07-01'
        and o_orderdate < '1993-10-01'
        and exists (
                select
                        *
                from
                        lineitem
                where
                        l_orderkey = o_orderkey
                        and l_commitdate < l_receiptdate
        )
group by
        o_orderpriority
order by
        o_orderpriority;

Error:
 terminate called after throwing an instance of 'noisepage::NotImplementedException'
  what():  Array in predicates not supported
Aborted (core dumped)

Initial analysis

The array in predicates error message is present in the optimizer. Clearly we are not being able to make a plan for the query in the optimizer. In the file tpch_query.cpp, there is a hand made plan under the function "MakeExecutableQ4". That plan can be used to understand the expected behavior of the optimizer.

Moreover the query can be rewritten to

select
        o_orderpriority,
        count(*) as order_count
from
        orders,
        (select l_orderkey as c from lineitem where l_commitdate < l_receiptdate GROUP BY l_orderkey) as x
where
        o_orderdate >= date '1993-07-01'
        and o_orderdate < date '1993-10-01'
        and x.c = o_orderkey
group by
        o_orderpriority
order by
        o_orderpriority

This query works and therefore there is functionality in the system to have such a query running.

I would suggest throwing out all the predicates, group by and order by in the query while working on the issue and using the following query to look for the problem.

select
        o_orderpriority
from
        orders
where
       	exists (
                select
                        *
                from
                        lineitem
                where
                        l_orderkey = o_orderkey
        );

gautam20197 avatar Nov 30 '20 23:11 gautam20197

More information on the error:

  • The exists clause has a subselect expression that corresponds to the select statement inside it.
  • While generating the subquery tree of the subselect we seem to support a subselect which has only one column as output. In the given query we clearly get more columns than 1.

gautam20197 avatar Dec 01 '20 18:12 gautam20197