opteryx icon indicating copy to clipboard operation
opteryx copied to clipboard

🪲 TPC-H 12 failure

Open joocer opened this issue 1 year ago • 0 comments

Thank you for taking the time to report a problem with Opteryx.

To help us to respond to your request we ask that you try to provide the below detail about the bug.

Describe the bug A clear and specific description of what the bug is. What the error, incorrect or unexpected behaviour was.

  • [ ] 1) fails with error Unable to locate column 'l_receiptdate'
  • [ ] 2) plan misses two optimizations (see bottom)

Sample Code/Statement If you can, please submit the SQL statement or Python code snippet, or a representative example using the sample datasets.

select
    l_shipmode,
    sum(case
        when o_orderpriority = '1-URGENT'
            or o_orderpriority = '2-HIGH'
            then 1
        else 0
    end) as high_line_count,
    sum(case
        when o_orderpriority <> '1-URGENT'
            and o_orderpriority <> '2-HIGH'
            then 1
        else 0
    end) as low_line_count
from
    data.tpch.orders,
    data.tpch.lineitem
where
    o_orderkey = l_orderkey
    and l_shipmode in ('REG AIR', 'MAIL')
    and l_commitdate < l_receiptdate
    and l_shipdate < l_commitdate
    and l_receiptdate >= '1995-01-01'
    and l_receiptdate < '1996-01-01'
group by
    l_shipmode
order by
    l_shipmode;

Additional context Add any other context about the problem here, for example what you have done to try to diagnose or workaround the problem.

AFTER OPTIMIZATION
└─ EXIT
   └─ ORDER BY (l_shipmode)
      └─ PROJECT (SUM(CASE WHEN o_orderpriority = '1-URGENT' OR o_orderpriority = '2-HIGH' THEN 1 WHEN True THEN 0 END), SUM(CASE WHEN o_orderpriority != '1-URGENT' AND o_orderpriority != '2-HIGH' THEN 1 WHEN True THEN 0 END), l_shipmode)
         └─ AGGREGATE (SUM(CASE WHEN o_orderpriority = '1-URGENT' OR o_orderpriority = '2-HIGH' THEN 1 WHEN True THEN 0 END), SUM(CASE WHEN o_orderpriority != '1-URGENT' AND o_orderpriority != '2-HIGH' THEN 1 WHEN True THEN 0 END)) GROUP BY (l_shipmode)
            └─ FILTER (o_orderkey = l_orderkey)
               └─ FILTER (l_commitdate < l_receiptdate)
                  └─ FILTER (l_shipdate < l_commitdate)
                     └─ CROSS JOIN 
                        ├─ READ (data.tpch.orders) [o_orderkey, o_orderpriority]
                        └─ FILTER (l_shipmode INLIST {'MAIL', 'REG AIR'})
                           └─ READ (data.tpch.lineitem) [l_orderkey, l_shipdate, l_commitdate, l_receiptdate, l_shipmode] (l_receiptdate >= 1995-01-01 AND l_receiptdate < 1996-01-01)

The filters aren't pushed far enough.

  • FILTER (o_orderkey = l_orderkey) should be pushed to the CROSS JOIN to make that an INNER JOIN
  • FILTER (l_shipdate < l_commitdate) and FILTER (l_commitdate < l_receiptdate) should be pushed to just after the READ, before the CROSS JOIN

joocer avatar Aug 24 '24 13:08 joocer