prql icon indicating copy to clipboard operation
prql copied to clipboard

Table name and transform name collisions

Open FinnRG opened this issue 2 years ago • 8 comments

What happened?

I have a table named group, which I tried to query. I tried using backticks (as suggested in the docs: https://prql-lang.org/book/reference/stdlib/transforms/from.html), which doesn't seem to work.

PRQL input

from `group`
filter id == $1

SQL output

No output

Expected SQL output

No response

MVCE confirmation

  • [X] Minimal example
  • [X] New issue

Anything else?

Exact error:

   ╭─[:2:1]
   │
 2 │ filter id == $1
   │ ───────┬───────  
   │        ╰───────── main expected type `relation`, but found type `infer -> infer`
   │ 
   │ Help: Have you forgotten an argument to function main?
   │ 
   │ Note: Type `relation` expands to `[tuple_of_scalars]`
───╯

FinnRG avatar Aug 06 '23 21:08 FinnRG

Thanks @FinnRG . Yes, repro-ed. And a particularly bad error message...

As a workaround in the meantime, this works:

from s"SELECT * from group"
WITH table_0 AS (
  SELECT
    *
  from
    group
)
SELECT
  *
FROM
  table_0

-- Generated by PRQL compiler version:0.9.3 (https://prql-lang.org)

max-sixty avatar Aug 06 '23 22:08 max-sixty

This is expected behavior. An unfortunate one, but a compound of how we do name resolution and table inference.

So the problem is that identifier group matches std.group (the group function) instead of default_db.group (the table declaration).

The solution is this:

from default_db.group
filter id == $1

... except that now, this will compile to SELECT * FROM default_db.group ..., because this will infer that we are referring to default_db.default_db.group.

The real solution is:

default_db.group
filter id == $1

... which bypasses from function and its behavior of defaulting to default_db module.

aljazerzen avatar Aug 07 '23 17:08 aljazerzen

Completely abstract of the current implementation, I do think that's a bit unfriendly / inconsistent.

I recognize we do need to make some tradeoffs because of PRQL's use of "bare words" — there'll always be some ambiguity. But (thinking on the fly here, not super confident):

  • there would be a robust way which machines could write — e.g. in column selection, we can always have this.foo, then this.time will resolve to a column even when time resolves to a type.
  • the robust way would follow from the less robust way. My objection to the current implementation is that if a machine wanted to generate PRQL, it would generate default_db.foo rather than from foo, which then doesn't look much like PRQL.

How about we extend the this paradigm to table selection too? So from this.group gets the group table?

We'd need to discriminate between scopes:

  • where there's an implicit relation — e.g. within a derive, where this refers to a relation
  • where the top scope is default_db — e.g. within a from, where this refers to the default db

(other languages have slightly complicated rules around this, e.g. jq and go templates use $ to jump scope levels IIRC)


See also: #2155, #1619

max-sixty avatar Aug 07 '23 17:08 max-sixty

Could I open a PR to add the default_db.group solution to the documentation, or is this a non-standard/implementation workaround? Personally I think the this solution would be good too, especially since this was one of the solutions I tried after the backticks didn't work as expected.

FinnRG avatar Aug 07 '23 20:08 FinnRG

Thanks for the offer re a PR for the docs — we'd def accept it.

If we do that, would be great to link to this issue and caveat — we don't think the current state is great, but this is a workaround...

max-sixty avatar Aug 07 '23 20:08 max-sixty

if a machine wanted to generate PRQL, it would generate default_db.foo rather than from foo, which then doesn't look much like PRQL.

That tells me that we should either:

  • change the fact that tables are regular declarations in modules and make them something special, so we can have special name resolution rules for from and join functions, or
  • remove the from function and embrace the pipeline:
    default_db.my_table
    select {a, b}
    
    (we could have something else in place of default_db, maybe tables?)

aljazerzen avatar Aug 14 '23 17:08 aljazerzen

WDYT about having something like this also operate on tables, like from this.group to refer to a table named group? In the same way that derive x = this.group refers to a column named group?

this could also be tables if that's simpler — otherwise it could be quite unclear what this refers to in a join.

But is there a need to discard the from there?


  • remove the from function and embrace the pipeline: default_db.my_table select {a, b}
    (we could have something else in place of default_db, maybe tables?)

This was the very first proposal for PRQL — no from. But folks didn't like it — I think because having each line start with the verb of what it's doing is nice & consistent...

max-sixty avatar Aug 15 '23 19:08 max-sixty

Coming late to this discussion but will give it some thought as well.

snth avatar Aug 15 '23 20:08 snth