prql icon indicating copy to clipboard operation
prql copied to clipboard

Reserved column names like "from" and "to"

Open neelance opened this issue 1 year ago • 2 comments

I have trouble using column names that are reserved keywords, for example from and to. There are two sides to this issue:

  1. PRQL has syntax errors when parsing these names.
  2. The generated SQL code does not put the names into backquotes.

I managed to use S-Strings as a partial workaround, but they don't work at the left side of =.

s"`from`"

neelance avatar Aug 01 '22 15:08 neelance

Good point @neelance .

I think this is how it should work:

derive `from` = 2

but at the moment from isn't escaped, because PRQL doesn't know it's a keyword:

SELECT
  2 AS
from

I think if we use this list, that should cover it: https://github.com/sqlparser-rs/sqlparser-rs/blob/main/src/keywords.rs#L65-L66

max-sixty avatar Aug 01 '22 18:08 max-sixty

If anyone wants to take this, it should be a fairly easy change, and would go here.

Otherwise I'll look at this after the current priority issues.

max-sixty avatar Aug 01 '22 18:08 max-sixty

This actually isn't as easy as it seems — @AlexRiedler made a great observation — sometimes we need keywords unquoted.

The as function is a case of this — with the change I proposed above, derive [foobar = (salary | as int)] would compile to "int", but we need int.

So I think this might need to be coupled with something more invasive — typing the parameter of as to be a keyword, which then isn't quoted.

Or is there a simpler way?

max-sixty avatar Oct 23 '22 02:10 max-sixty

This has been resolved by #593. This query now produces an error:

from travels
select from = from_destination  # this is all valid
derive derived = from + 3       # error here
Error: 
   ╭─[:3:18]
   │
 3 │ derive derived = from + 3
   ·                  ──┬──  
   ·                    ╰──── Ambiguous reference. Could be from any of std.from, _frame.from
───╯

Changing to:

from travels
select from = from_destination
derive derived = _frame.from + 3

... produces:

SELECT
  from_destination AS
from
,
  from_destination + 3 AS derived
FROM
  travels

Which is valid SQL, even though formatter trips a bit.

aljazerzen avatar Dec 01 '22 11:12 aljazerzen

I couldn't get this to work. The best I could manage was:

select [call_id, timestamp, business_hours, direction, type, from_ = s"`from`", to_ = s"`to`", peer]

I couldn't get it to work without the S-String and I had to use from_ instead of from.

Reopen?

neelance avatar Dec 01 '22 15:12 neelance

This works in the playground:

from employees
select [call_id, timestamp, business_hours, direction, type, from_ = _frame.from, to_ = to, peer]

If you are using one of the bindings or vscode, you'll have to wait a few days for 0.3 to be pushed.

aljazerzen avatar Dec 01 '22 15:12 aljazerzen

In the playground you get:

SELECT
  call_id,
  timestamp,
  business_hours,
  direction,
  type,
from
  AS from_,
  to AS to_,
  peer
FROM
  employees

This is not valid SQL, at least not for BigQuery.

neelance avatar Dec 01 '22 17:12 neelance

It seems like resolving is fixed — but in the translation, to what extent do we need to quote the keywords in the resulting SQL?

i.e.

SELECT
  call_id,
  timestamp,
  business_hours,
  direction,
  type,
-  from AS from_,
+  `from` AS from_,
-  to AS to_,
+  `to` AS to_,
  peer
FROM
  employees

I think that's required at least for BigQuery.


One other thought: I think the from being backticked should mean it's forced into an ident, rather than possibly interpreted as a function? But possibly one advantage of the new resolver is that there's less of a hard distinction there now.

from travels
select from = from_destination
derive derived = `from` + 3
Error: 
   ╭─[:3:18]
   │
 3 │ derive derived = `from` + 3
   ·                  ───┬───  
   ·                     ╰───── Ambiguous reference. Could be from any of std.from, _frame.from
───╯

Maybe this is something I could have a look at?

max-sixty avatar Dec 02 '22 09:12 max-sixty

The name of the function is just an ident at first when it gets resolved. And we have a single namespace of for all idents.

aljazerzen avatar Dec 02 '22 09:12 aljazerzen

Yes good point. But then at the translation — when there's an ident (i.e. there are no functions any longer) — then shouldn't be quote keywords, in the same way we quote idents with symbols in?

max-sixty avatar Dec 02 '22 09:12 max-sixty

Apparently yes, for some dialects.

(But we can implement it for all dialects, just to keep the code complexity down.)

So we'd need a list of SQL keywords to check against that.

TODO label:"good first issue"

Quoting is done here: https://github.com/prql/prql/blob/main/prql-compiler/src/sql/codegen.rs#L537-L539

Basically, we need to check if the ident part is a keyword and use quotes in that case.

Fortunately, sqlparser has a list of keywords: https://github.com/sqlparser-rs/sqlparser-rs/blob/main/src/keywords.rs#L1

First, we need to convert String to Keyword, like they do: https://github.com/sqlparser-rs/sqlparser-rs/blob/main/src/tokenizer.rs#L233-L234

Then search if it's reserved: https://github.com/sqlparser-rs/sqlparser-rs/blob/main/src/parser.rs#L1896-L1898

They have two lists of reserved keywords for aliases: RESERVED_FOR_COLUMN_ALIAS and RESERVED_FOR_TABLE_ALIAS. Because our translate_ident_part is not context aware and does not know if it's translating column or table ident, and because we cannot "quote too much", I think we can just check both of them. They are small enough to check linearly, but an optimization would be to create a cached reserved: HashSet<KeyWord> on translator::Context.

Make sure to do binary search for keyword only if none of the existing conditions matches so we avoid doing it unnecessarily.

aljazerzen avatar Dec 02 '22 10:12 aljazerzen