prql
prql copied to clipboard
Reserved column names like "from" and "to"
I have trouble using column names that are reserved keywords, for example from
and to
. There are two sides to this issue:
- PRQL has syntax errors when parsing these names.
- The generated SQL code does not put the names into backquotes.
I managed to use S-Strings as a partial workaround, but they don't work at the left side of =
.
s"`from`"
Good point @neelance .
I think this is how it should work:
derive `from` = 2
but at the moment from
isn't escaped, because PRQL doesn't know it's a keyword:
SELECT
2 AS
from
I think if we use this list, that should cover it: https://github.com/sqlparser-rs/sqlparser-rs/blob/main/src/keywords.rs#L65-L66
If anyone wants to take this, it should be a fairly easy change, and would go here.
Otherwise I'll look at this after the current priority issues.
This actually isn't as easy as it seems — @AlexRiedler made a great observation — sometimes we need keywords unquoted.
The as
function is a case of this — with the change I proposed above, derive [foobar = (salary | as int)]
would compile to "int"
, but we need int
.
So I think this might need to be coupled with something more invasive — typing the parameter of as
to be a keyword, which then isn't quoted.
Or is there a simpler way?
This has been resolved by #593. This query now produces an error:
from travels
select from = from_destination # this is all valid
derive derived = from + 3 # error here
Error:
╭─[:3:18]
│
3 │ derive derived = from + 3
· ──┬──
· ╰──── Ambiguous reference. Could be from any of std.from, _frame.from
───╯
Changing to:
from travels
select from = from_destination
derive derived = _frame.from + 3
... produces:
SELECT
from_destination AS
from
,
from_destination + 3 AS derived
FROM
travels
Which is valid SQL, even though formatter trips a bit.
I couldn't get this to work. The best I could manage was:
select [call_id, timestamp, business_hours, direction, type, from_ = s"`from`", to_ = s"`to`", peer]
I couldn't get it to work without the S-String and I had to use from_
instead of from
.
Reopen?
This works in the playground:
from employees
select [call_id, timestamp, business_hours, direction, type, from_ = _frame.from, to_ = to, peer]
If you are using one of the bindings or vscode, you'll have to wait a few days for 0.3 to be pushed.
In the playground you get:
SELECT
call_id,
timestamp,
business_hours,
direction,
type,
from
AS from_,
to AS to_,
peer
FROM
employees
This is not valid SQL, at least not for BigQuery.
It seems like resolving is fixed — but in the translation, to what extent do we need to quote the keywords in the resulting SQL?
i.e.
SELECT
call_id,
timestamp,
business_hours,
direction,
type,
- from AS from_,
+ `from` AS from_,
- to AS to_,
+ `to` AS to_,
peer
FROM
employees
I think that's required at least for BigQuery.
One other thought: I think the from
being backticked should mean it's forced into an ident, rather than possibly interpreted as a function? But possibly one advantage of the new resolver is that there's less of a hard distinction there now.
from travels
select from = from_destination
derive derived = `from` + 3
Error:
╭─[:3:18]
│
3 │ derive derived = `from` + 3
· ───┬───
· ╰───── Ambiguous reference. Could be from any of std.from, _frame.from
───╯
Maybe this is something I could have a look at?
The name of the function is just an ident at first when it gets resolved. And we have a single namespace of for all idents.
Yes good point. But then at the translation — when there's an ident (i.e. there are no functions any longer) — then shouldn't be quote keywords, in the same way we quote idents with symbols in?
Apparently yes, for some dialects.
(But we can implement it for all dialects, just to keep the code complexity down.)
So we'd need a list of SQL keywords to check against that.
TODO label:"good first issue"
Quoting is done here: https://github.com/prql/prql/blob/main/prql-compiler/src/sql/codegen.rs#L537-L539
Basically, we need to check if the ident part is a keyword and use quotes in that case.
Fortunately, sqlparser has a list of keywords: https://github.com/sqlparser-rs/sqlparser-rs/blob/main/src/keywords.rs#L1
First, we need to convert String
to Keyword
, like they do:
https://github.com/sqlparser-rs/sqlparser-rs/blob/main/src/tokenizer.rs#L233-L234
Then search if it's reserved: https://github.com/sqlparser-rs/sqlparser-rs/blob/main/src/parser.rs#L1896-L1898
They have two lists of reserved keywords for aliases: RESERVED_FOR_COLUMN_ALIAS
and RESERVED_FOR_TABLE_ALIAS
. Because our translate_ident_part
is not context aware and does not know if it's translating column or table ident, and because we cannot "quote too much", I think we can just check both of them. They are small enough to check linearly, but an optimization would be to create a cached reserved: HashSet<KeyWord>
on translator::Context.
Make sure to do binary search for keyword only if none of the existing conditions matches so we avoid doing it unnecessarily.