prql
prql copied to clipboard
s-string impact on semantic checks
Currently this raises an error:
func filter_amount method -> s"sum(case when payment_method = '{method}' then amount end) as {method}_amount"
from {{ ref('raw_payments') }}
group order_id (
aggregate [
bank_transfer_amount = filter_amount bank_transfer,
]
)
select bank_transfer_amount
Error:
╭─[:9:8]
│
9 │ select bank_transfer_amount
· ──────────┬─────────
· ╰─────────── Unknown variable `bank_transfer_amount`
───╯
bank_transfer_amount is created in the original query, but in the s-string, so our semantic code doesn't know about it.
The benefits of s-strings are that they're a flexible escape hatch. But with a flexible escape hatch we lose information on what escaped!
I'm not sure what the best thing to do here is:
- Allow an override to semantic checks in the PRQL header?
- Parse the s-string for
asand override checks if we find one?? - Skip all semantic checks if there's an s-string??
- We could force writing
bank_transfer_amount = filter_amount bank_transfer(and adjusting the function)?- That's fine in this case, but I worry about bounding s-strings more generally — I think we should be quite humble about thinking we can guess what queries people need to write, and realistically even fairly standard queries will need to rely on s-strings.
Columns names in SQL are not all analogous to PRQL variables. There may exist temporary SQL columns that are not part of your PRQL query (see ROW_NUMBER in second example).
This is why we should highly discourage using AS in s-strings as it "adds" a name in SQL scope, but not in PRQL. Moreover, s-strings may break things if they don't evaluate to expressions only.
Instead, we should provide better language features to steer people away from the thing you wanted to do here. We may need some type of templating, so vanilla PRQL, without jinja, can do something like:
func filter_amount method -> s"sum(case when payment_method = '{method}' then amount end)"
from aaa
aggregate [(${col}_amount = filter_amount col) for col in [bank_transfer, taxes, gross_cost]]
(but with better syntax than Python's - maybe something more functional)
PS: Your statement above would now evaluate to something like:
... then amount end) as bank_transfer_amount as bank_transfer_amount ...
Instead, we should provide better language features to steer people away from the thing you wanted to do here.
Very much agree!
We may need some type of templating, so vanilla PRQL, without jinja, can do something like:
I think we should really try and avoid string templating, so a slight tweak of your proposal — we could have an actual function which does this sort of operation — basically a pivot, which the values known at compile time. I think it's probably common enough.
So more like
pivot payment_method [bank_transfer, taxes, gross_cost] $_amount
That said, I worry that s-strings don't have escape hatches for things like this as we ramp up. If there were a way of doing a # noqa, I think that would be great.
This is done / duplicate of #644