prql
prql copied to clipboard
Table vs Value types
@qorrect raises an interesting point on a Discord thread.
Something I find frustrating about SQL is that a query result can have three types:
- a table
- a column
- a value — one column, one row
Some queries implicitly require one type. For example, in:
SELECT name
FROM facts
WHERE population = (SELECT MIN(population) FROM facts);
...the subquery SELECT MIN(population) FROM facts has to be a Value. But it's difficult to know that before running the query
Here it's a full aggregation, so it is possible to statically check that, but it could be
select population where country = "USA"which is ambiguous. And I'm not aware of live feedback on SQL tools to show that error even in the unambiguous case
Here's the PRQL of that query, separated into a table for effect:
table min_pop = (
facts
aggregate [min population]
)
from facts
select name
filter population = min_pop
We could make this be statically checked, by defining a value type:
-table min_pop = (
+value min_pop = (
facts
aggregate [min population]
)
...and then having aggregate [min population, max population] would fail at "compile time".
Ambiguous subqueries such as filter country = "USA" could require a take 1 to guarantee it's a single value.
Love the idea, and I if we can error out at compile time on things like if the select contains more than one field ( for value types ) that would be great. Really liking the compile time checks.
Reflecting on #478, there may be a 4th type — a "Row" type — with multiple columns but only one row. Modifying the example from above:
SELECT name
FROM facts
WHERE population = (SELECT population FROM min_population);
...here min_population needs to be a single row, but doesn't need to be a scalar.