prql
prql copied to clipboard
Table name and transform name collisions
What happened?
I have a table named group, which I tried to query. I tried using backticks (as suggested in the docs: https://prql-lang.org/book/reference/stdlib/transforms/from.html), which doesn't seem to work.
PRQL input
from `group`
filter id == $1
SQL output
No output
Expected SQL output
No response
MVCE confirmation
- [X] Minimal example
- [X] New issue
Anything else?
Exact error:
╭─[:2:1]
│
2 │ filter id == $1
│ ───────┬───────
│ ╰───────── main expected type `relation`, but found type `infer -> infer`
│
│ Help: Have you forgotten an argument to function main?
│
│ Note: Type `relation` expands to `[tuple_of_scalars]`
───╯
Thanks @FinnRG . Yes, repro-ed. And a particularly bad error message...
As a workaround in the meantime, this works:
from s"SELECT * from group"
WITH table_0 AS (
SELECT
*
from
group
)
SELECT
*
FROM
table_0
-- Generated by PRQL compiler version:0.9.3 (https://prql-lang.org)
This is expected behavior. An unfortunate one, but a compound of how we do name resolution and table inference.
So the problem is that identifier group matches std.group (the group function) instead of default_db.group (the table declaration).
The solution is this:
from default_db.group
filter id == $1
... except that now, this will compile to SELECT * FROM default_db.group ..., because this will infer that we are referring to default_db.default_db.group.
The real solution is:
default_db.group
filter id == $1
... which bypasses from function and its behavior of defaulting to default_db module.
Completely abstract of the current implementation, I do think that's a bit unfriendly / inconsistent.
I recognize we do need to make some tradeoffs because of PRQL's use of "bare words" — there'll always be some ambiguity. But (thinking on the fly here, not super confident):
- there would be a robust way which machines could write — e.g. in column selection, we can always have
this.foo, thenthis.timewill resolve to a column even whentimeresolves to a type. - the robust way would follow from the less robust way. My objection to the current implementation is that if a machine wanted to generate PRQL, it would generate
default_db.foorather thanfrom foo, which then doesn't look much like PRQL.
How about we extend the this paradigm to table selection too? So from this.group gets the group table?
We'd need to discriminate between scopes:
- where there's an implicit relation — e.g. within a
derive, wherethisrefers to a relation - where the top scope is
default_db— e.g. within afrom, wherethisrefers to the default db
(other languages have slightly complicated rules around this, e.g. jq and go templates use $ to jump scope levels IIRC)
See also: #2155, #1619
Could I open a PR to add the default_db.group solution to the documentation, or is this a non-standard/implementation workaround? Personally I think the this solution would be good too, especially since this was one of the solutions I tried after the backticks didn't work as expected.
Thanks for the offer re a PR for the docs — we'd def accept it.
If we do that, would be great to link to this issue and caveat — we don't think the current state is great, but this is a workaround...
if a machine wanted to generate PRQL, it would generate default_db.foo rather than from foo, which then doesn't look much like PRQL.
That tells me that we should either:
- change the fact that tables are regular declarations in modules and make them something special, so we can have special name resolution rules for
fromandjoinfunctions, or - remove the
fromfunction and embrace the pipeline:
(we could have something else in place ofdefault_db.my_table select {a, b}default_db, maybetables?)
WDYT about having something like this also operate on tables, like from this.group to refer to a table named group? In the same way that derive x = this.group refers to a column named group?
this could also be tables if that's simpler — otherwise it could be quite unclear what this refers to in a join.
But is there a need to discard the from there?
- remove the
fromfunction and embrace the pipeline:default_db.my_table select {a, b}
(we could have something else in place ofdefault_db, maybetables?)
This was the very first proposal for PRQL — no from. But folks didn't like it — I think because having each line start with the verb of what it's doing is nice & consistent...
Coming late to this discussion but will give it some thought as well.