dolt icon indicating copy to clipboard operation
dolt copied to clipboard

ORDER BY expressions containing aggregation functions are not handled appropriately by the analyzer.

Open reltuk opened this issue 3 years ago • 2 comments

The following works because the SUM gets mapped directly to the projected SUM expression (by resolve_columns):

SELECT category, SUM(price) FROM products GROUP BY category ORDER BY SUM(price) ASC

The following does not work, because the SUM(price) (sub)expression currently sticks around in the SortFields of the plan.Sort node, but it cannot evaluate correctly in this context:

SELECT category, SUM(price) FROM products GROUP BY category ORDER BY SUM(price) + 1 ASC

Other things that should generally work but do not:

SELECT category, SUM(price) FROM products GROUP BY category ORDER BY AVG(price) ASC

SELECT category, SUM(price) FROM products GROUP BY category ORDER BY COUNT(*) ASC

SELECT category, SUM(price) FROM products GROUP BY category ORDER BY SUM(price) % 2, SUM(price), AVG(price) ASC

In general, a correct way to handle an aggregation in a sort expression is to push the expression down to the group by node, replace the expression with an appropriately indexed GetField in the sort node itself, and project the expression away in a projection above the Sort node. The logic to do that does not currently exist in the analyzer.

This issue also might apply to window functions, but I have not investigated there yet.

For now, we are going to add a validation step that looks for aggregations outside of GroupBy expressions. If they exist, the query is unsupported and we will return an error.

reltuk avatar Sep 08 '21 18:09 reltuk

this is still broken as of dolt 0.50.8

max-hoffman avatar Dec 01 '22 17:12 max-hoffman

repro:

> create table xy (x int primary key, y int);

> select x, sum(y) from xy group by x order by avg(y);
column "AVG(xy.y)" could not be found in any table in scope

> select x, sum(y) from xy group by x order by sum(y)+1;
an aggregation remained in the expression '(SUM(xy.y) + 1)' after analysis, outside of a node capable of evaluating it; this query is currently unsupported.

max-hoffman avatar Dec 01 '22 17:12 max-hoffman

These work now :-)

test_subra/main*> create table xy (x int primary key, y int);
test_subra/main*> select x, sum(y) from xy group by x order by avg(y);
Empty set (0.00 sec)

test_subra/main*> select x, sum(y) from xy group by x order by sum(y)+1;
Empty set (0.00 sec)

test_subra/main*>

timsehn avatar Apr 15 '24 22:04 timsehn