prql icon indicating copy to clipboard operation
prql copied to clipboard

Approach for expressions in `from`

Open max-sixty opened this issue 3 years ago • 5 comments
trafficstars

Currently from s"<s-string>" is not allowed. We'd like to make this possible, both for https://github.com/prql/prql/issues/918, and because it's a general & orthogonal language feature.

As part of that we can also allow broader expressions, such as from (from y). While I don't see this being that useful, it would be useful to generally extend expressions to everywhere; e.g. in sort & group; so sort (foo * 2) works.

There's some tradeoff between this and Semantic — if we have an s-string there, then we can't really understand where columns come from.

Any thoughts on this tradeoff? Do we have a mode in Semantic that basically shrugs and says "I dunno"?

max-sixty avatar Aug 08 '22 04:08 max-sixty

Unfortunately, I don't think this is possible.

From requires an ident, because it is later used as prefix for all column names. I.e.

from employees
select age

When age is encountered, it is resolved as employees.age. And if later another age is encountered, it is also translated to SQL as such:

table buildings = (from buildings | select [age, building_id])

from employees
select [age, building_id]
join buildings [building_id]

Even though employees.age was referenced only as age, it will translate to employees.age. That's why we need table name as a single ident.

Just a peek behind the scenes: From is actually an identity function. It's only function is that it resolves its argument in "table namespace" - so it cannot resolve into a column or a function.

aljazerzen avatar Aug 10 '22 10:08 aljazerzen

Right, very good point.

One approach that is a) awkward and b) only covers some cases — is to make a CTE out of the s-string:

from s"SELECT * FROM employees WHERE foo > 5"

would compile to:

with _31 as (
    SELECT * FROM employees WHERE foo > 5
)
SELECT * FROM _31

...and then the columns are on _31, like _31.age not on the s-string.


But it only covers some cases — it doesn't cover the dbt case:

from s'"foo"."employees"'

...which needs to be treated like the literal "foo"."employees", so we get:

SELECT * from "foo"."employees".age

We could make a guess (i.e. does it have a SELECT word), but then it feels like we're tripping over ourselves because of a bad abstraction.


I still think it's worth doing this for sort / group / etc — there are far fewer tradeoffs there, AFAIK

max-sixty avatar Aug 11 '22 00:08 max-sixty

Yes, sort and group are a differnet story. Similar, but possible to implement.

from employees
group age % 10 (aggregate count)

... would resolve to:

from employees
derive _tmp = age % 10
group _tmp (aggregate count)

... which would translate to:

with _table_0 = (select age % 10 as tmp, * from employees)
select _tmp, count(*)
from _table_0
group by _tmp

aljazerzen avatar Aug 11 '22 08:08 aljazerzen

Oh wait, SQL does support expressions in GROUP BY! It's even easier.

aljazerzen avatar Aug 11 '22 08:08 aljazerzen

Ah great! Though ORDER BY needs the adjustment above (I have been trying to get this to work, but in short bursts again...)

max-sixty avatar Aug 11 '22 16:08 max-sixty

This was closed by #1197

Both:

from employees
group age % 10 (aggregate count)

... and ...

from s"SELECT * FROM employees WHERE foo > 5"

... now compile correctly.

aljazerzen avatar Dec 01 '22 11:12 aljazerzen

That's so awesome, again, @aljazerzen !

Maybe I add some examples to the book?

max-sixty avatar Dec 02 '22 07:12 max-sixty