prql
prql copied to clipboard
Allowing dbt-prql to work with Postgres
I've been adding some integration tests to dbt-prql, and unfortunately it doesn't work with the dialects that use double-quotes.
That's because:
from in_process = {{ source('salesforce', 'in_process') }}
is first parsed to:
from "salesforce"."in_process"
...and then passed to the prql compiler. That's not valid PRQL. I had been expecting from in_process = {{ source('salesforce', 'in_process') }}
to be passed to the prql compiler first — which is valid prql — and then for the compiled sql to be formatted by dbt[^1].
[^1]:
With BQ, this is fine, since this is valid PRQL:
from `salesforce`.`in_process`
Some options for remedying it:
- Ask people to use an s-string for these — i.e.
s"{{ source('salesforce', 'in_process') }}"
— and allow s-strings infrom
- Ask people to use backticks — i.e.
`{{ source('salesforce', 'in_process') }}`
— and adjust how we handle backticks such that`"foo"`
compiles to"foo"
rather than"""foo"""
.- Currently we take the literal inside backticks, and then escape it, such that the DB receives the identifier within the backticks. That means it escapes the
"
s — as though we've called a column"foo"
, with the quotes.
- Currently we take the literal inside backticks, and then escape it, such that the DB receives the identifier within the backticks. That means it escapes the
I spent a lot of time trying to get dbt to pass the raw string to prql, and then parse the jinja after. But I couldn't manage a way, because we monkey patch into dbt's jinja, and jinja passes extensions an AST. Ofc if anyone has any ideas, then very open to exploring them.
The problem here is that DBT emits result of {{ }}
as an SQL identifier, which is not the same as PRQL identifier.
The ideal solution would be for DBT to emit a PRQL ident which would be salesforce.in_process
.
I don't think that s-strings are an option, as I wrote in #919
The problem here is that DBT emits result of
{{ }}
as an SQL identifier, which is not the same as PRQL identifier.
Yes, well summarized
The ideal solution would be for DBT to emit a PRQL ident which would be
salesforce.in_process
.
I agree — though that would require a much bigger change. Either we need dbt to support PRQL (we're in discussions with them now), or we'd need to do very invasive monkeypatching, a whole extra level, and we're already quite extreme there...
I'll continue the discussion #919.
Otherwise we could just not support postgres / redshift / snowflake until we're better integrated into dbt.. Maybe we work on increasing adoption in BQ first, before we sink more time into making it work with those.
If I remember correctly, we decided not to parse jinja expressions with PRQL and instead create a dbt plugin that would allow to run dbt first, resolve jinja expressions and only then run PRQL.
The plugin is here dbt-core#5982.