dolt
dolt copied to clipboard
ORDER BY expressions containing aggregation functions are not handled appropriately by the analyzer.
The following works because the SUM gets mapped directly to the projected SUM expression (by resolve_columns):
SELECT category, SUM(price) FROM products GROUP BY category ORDER BY SUM(price) ASC
The following does not work, because the SUM(price)
(sub)expression currently sticks around in the SortFields of the plan.Sort
node, but it cannot evaluate correctly in this context:
SELECT category, SUM(price) FROM products GROUP BY category ORDER BY SUM(price) + 1 ASC
Other things that should generally work but do not:
SELECT category, SUM(price) FROM products GROUP BY category ORDER BY AVG(price) ASC
SELECT category, SUM(price) FROM products GROUP BY category ORDER BY COUNT(*) ASC
SELECT category, SUM(price) FROM products GROUP BY category ORDER BY SUM(price) % 2, SUM(price), AVG(price) ASC
In general, a correct way to handle an aggregation in a sort expression is to push the expression down to the group by node, replace the expression with an appropriately indexed GetField
in the sort node itself, and project the expression away in a projection above the Sort node. The logic to do that does not currently exist in the analyzer.
This issue also might apply to window functions, but I have not investigated there yet.
For now, we are going to add a validation step that looks for aggregations outside of GroupBy expressions. If they exist, the query is unsupported and we will return an error.
this is still broken as of dolt 0.50.8
repro:
> create table xy (x int primary key, y int);
> select x, sum(y) from xy group by x order by avg(y);
column "AVG(xy.y)" could not be found in any table in scope
> select x, sum(y) from xy group by x order by sum(y)+1;
an aggregation remained in the expression '(SUM(xy.y) + 1)' after analysis, outside of a node capable of evaluating it; this query is currently unsupported.
These work now :-)
test_subra/main*> create table xy (x int primary key, y int);
test_subra/main*> select x, sum(y) from xy group by x order by avg(y);
Empty set (0.00 sec)
test_subra/main*> select x, sum(y) from xy group by x order by sum(y)+1;
Empty set (0.00 sec)
test_subra/main*>