prql
prql copied to clipboard
Approach for expressions in `from`
Currently from s"<s-string>" is not allowed. We'd like to make this possible, both for https://github.com/prql/prql/issues/918, and because it's a general & orthogonal language feature.
As part of that we can also allow broader expressions, such as from (from y). While I don't see this being that useful, it would be useful to generally extend expressions to everywhere; e.g. in sort & group; so sort (foo * 2) works.
There's some tradeoff between this and Semantic — if we have an s-string there, then we can't really understand where columns come from.
Any thoughts on this tradeoff? Do we have a mode in Semantic that basically shrugs and says "I dunno"?
Unfortunately, I don't think this is possible.
From requires an ident, because it is later used as prefix for all column names. I.e.
from employees
select age
When age is encountered, it is resolved as employees.age. And if later another age is encountered, it is also translated to SQL as such:
table buildings = (from buildings | select [age, building_id])
from employees
select [age, building_id]
join buildings [building_id]
Even though employees.age was referenced only as age, it will translate to employees.age. That's why we need table name as a single ident.
Just a peek behind the scenes: From is actually an identity function. It's only function is that it resolves its argument in "table namespace" - so it cannot resolve into a column or a function.
Right, very good point.
One approach that is a) awkward and b) only covers some cases — is to make a CTE out of the s-string:
from s"SELECT * FROM employees WHERE foo > 5"
would compile to:
with _31 as (
SELECT * FROM employees WHERE foo > 5
)
SELECT * FROM _31
...and then the columns are on _31, like _31.age not on the s-string.
But it only covers some cases — it doesn't cover the dbt case:
from s'"foo"."employees"'
...which needs to be treated like the literal "foo"."employees", so we get:
SELECT * from "foo"."employees".age
We could make a guess (i.e. does it have a SELECT word), but then it feels like we're tripping over ourselves because of a bad abstraction.
I still think it's worth doing this for sort / group / etc — there are far fewer tradeoffs there, AFAIK
Yes, sort and group are a differnet story. Similar, but possible to implement.
from employees
group age % 10 (aggregate count)
... would resolve to:
from employees
derive _tmp = age % 10
group _tmp (aggregate count)
... which would translate to:
with _table_0 = (select age % 10 as tmp, * from employees)
select _tmp, count(*)
from _table_0
group by _tmp
Oh wait, SQL does support expressions in GROUP BY! It's even easier.
Ah great! Though ORDER BY needs the adjustment above (I have been trying to get this to work, but in short bursts again...)
This was closed by #1197
Both:
from employees
group age % 10 (aggregate count)
... and ...
from s"SELECT * FROM employees WHERE foo > 5"
... now compile correctly.
That's so awesome, again, @aljazerzen !
Maybe I add some examples to the book?