prql icon indicating copy to clipboard operation
prql copied to clipboard

s-string impact on semantic checks

Open max-sixty opened this issue 3 years ago • 2 comments
trafficstars

Currently this raises an error:

func filter_amount method -> s"sum(case when payment_method = '{method}' then amount end) as {method}_amount"

from {{ ref('raw_payments') }}
group order_id (
  aggregate [
    bank_transfer_amount = filter_amount bank_transfer,
  ]
)
select bank_transfer_amount
Error: 
   ╭─[:9:8]
   │
 9 │ select bank_transfer_amount
   ·        ──────────┬─────────  
   ·                  ╰─────────── Unknown variable `bank_transfer_amount`
───╯

bank_transfer_amount is created in the original query, but in the s-string, so our semantic code doesn't know about it.

The benefits of s-strings are that they're a flexible escape hatch. But with a flexible escape hatch we lose information on what escaped!

I'm not sure what the best thing to do here is:

  • Allow an override to semantic checks in the PRQL header?
  • Parse the s-string for as and override checks if we find one??
  • Skip all semantic checks if there's an s-string??
  • We could force writing bank_transfer_amount = filter_amount bank_transfer (and adjusting the function)?
    • That's fine in this case, but I worry about bounding s-strings more generally — I think we should be quite humble about thinking we can guess what queries people need to write, and realistically even fairly standard queries will need to rely on s-strings.

max-sixty avatar May 30 '22 04:05 max-sixty

Columns names in SQL are not all analogous to PRQL variables. There may exist temporary SQL columns that are not part of your PRQL query (see ROW_NUMBER in second example).

This is why we should highly discourage using AS in s-strings as it "adds" a name in SQL scope, but not in PRQL. Moreover, s-strings may break things if they don't evaluate to expressions only.

Instead, we should provide better language features to steer people away from the thing you wanted to do here. We may need some type of templating, so vanilla PRQL, without jinja, can do something like:

func filter_amount method -> s"sum(case when payment_method = '{method}' then amount end)"

from aaa
aggregate [(${col}_amount = filter_amount col) for col in [bank_transfer, taxes, gross_cost]]

(but with better syntax than Python's - maybe something more functional)

PS: Your statement above would now evaluate to something like:

... then amount end) as bank_transfer_amount as bank_transfer_amount ...

aljazerzen avatar Jun 09 '22 13:06 aljazerzen

Instead, we should provide better language features to steer people away from the thing you wanted to do here.

Very much agree!

We may need some type of templating, so vanilla PRQL, without jinja, can do something like:

I think we should really try and avoid string templating, so a slight tweak of your proposal — we could have an actual function which does this sort of operation — basically a pivot, which the values known at compile time. I think it's probably common enough.

So more like

pivot payment_method [bank_transfer, taxes, gross_cost] $_amount

That said, I worry that s-strings don't have escape hatches for things like this as we ramp up. If there were a way of doing a # noqa, I think that would be great.

max-sixty avatar Jun 11 '22 21:06 max-sixty

This is done / duplicate of #644

aljazerzen avatar Jan 02 '23 11:01 aljazerzen