hyper-api-samples icon indicating copy to clipboard operation
hyper-api-samples copied to clipboard

Query 18 spill to disk and take a lot of space

Open djouallah opened this issue 1 year ago • 1 comments

I am trying to run TPCH-SF100 on my laptop, the good news, I can build the hyper file easily, the test run very well except Query 18 which timeout as I don't have enough empty space on my laptop , I saw some people sort lineitem but I can't do it on my laptop as it time out too

defining FK and PK did not help

SELECT
    --Query18
    c_name,
    c_custkey,
    o_orderkey,
    o_orderdate,
    o_totalprice,
    SUM(l_quantity)
FROM
    customer,
    orders,
    lineitem
WHERE
    o_orderkey IN (
        SELECT
            l_orderkey
        FROM
            lineitem
        GROUP BY
            l_orderkey
        HAVING
            SUM(l_quantity) > 300
    )
    AND c_custkey = o_custkey
    AND o_orderkey = l_orderkey
GROUP BY
    c_name,
    c_custkey,
    o_orderkey,
    o_orderdate,
    o_totalprice
ORDER BY
    o_totalprice DESC,
    o_orderdate
LIMIT
    100;

djouallah avatar Mar 23 '23 12:03 djouallah

Thank you for this report! And thanks for benchmarking Hyper! Benchmarks results are very valuable for us to understand where Hyper should still be improved. In particular, if those benchmarks are done by people outside the Hyper team, that also helps to confirm the reproduceability of Hyper's performance 🙂

In this particular case, it's probably good that Hyper isn't able to successfully finish the query - finishing it with spooling would be horribly slow. In general, I would recommend to just disable spooling altogether... I prefer a quick error message over a very long-running query.

The problem for this query is that the subquery

SELECT
    l_orderkey
FROM
    lineitem
GROUP BY
    l_orderkey
HAVING
    SUM(l_quantity) > 300

runs out of memory. The GROUP BY l_orderkey needs to keep too much state in memory.

We will have to see when we get to this. Fixing TPC-H queries is currently not our primary business priority, but as performance enthusiasts, we are of course personally interested in unblocking TPC-H Q18

vogelsgesang avatar Mar 27 '23 16:03 vogelsgesang