prql icon indicating copy to clipboard operation
prql copied to clipboard

Allowing dbt-prql to work with Postgres

Open max-sixty opened this issue 1 year ago • 2 comments

I've been adding some integration tests to dbt-prql, and unfortunately it doesn't work with the dialects that use double-quotes.

That's because:

from in_process = {{ source('salesforce', 'in_process') }}

is first parsed to:

from "salesforce"."in_process"

...and then passed to the prql compiler. That's not valid PRQL. I had been expecting from in_process = {{ source('salesforce', 'in_process') }} to be passed to the prql compiler first — which is valid prql — and then for the compiled sql to be formatted by dbt[^1].

[^1]: With BQ, this is fine, since this is valid PRQL: from `salesforce`.`in_process`

Some options for remedying it:

  • Ask people to use an s-string for these — i.e. s"{{ source('salesforce', 'in_process') }}" — and allow s-strings in from
  • Ask people to use backticks — i.e. `{{ source('salesforce', 'in_process') }}` — and adjust how we handle backticks such that `"foo"` compiles to "foo" rather than """foo""".
    • Currently we take the literal inside backticks, and then escape it, such that the DB receives the identifier within the backticks. That means it escapes the "s — as though we've called a column "foo", with the quotes.

I spent a lot of time trying to get dbt to pass the raw string to prql, and then parse the jinja after. But I couldn't manage a way, because we monkey patch into dbt's jinja, and jinja passes extensions an AST. Ofc if anyone has any ideas, then very open to exploring them.

max-sixty avatar Aug 07 '22 23:08 max-sixty

The problem here is that DBT emits result of {{ }} as an SQL identifier, which is not the same as PRQL identifier.

The ideal solution would be for DBT to emit a PRQL ident which would be salesforce.in_process.

I don't think that s-strings are an option, as I wrote in #919

aljazerzen avatar Aug 10 '22 10:08 aljazerzen

The problem here is that DBT emits result of {{ }} as an SQL identifier, which is not the same as PRQL identifier.

Yes, well summarized

The ideal solution would be for DBT to emit a PRQL ident which would be salesforce.in_process.

I agree — though that would require a much bigger change. Either we need dbt to support PRQL (we're in discussions with them now), or we'd need to do very invasive monkeypatching, a whole extra level, and we're already quite extreme there...


I'll continue the discussion #919.

Otherwise we could just not support postgres / redshift / snowflake until we're better integrated into dbt.. Maybe we work on increasing adoption in BQ first, before we sink more time into making it work with those.

max-sixty avatar Aug 11 '22 00:08 max-sixty

If I remember correctly, we decided not to parse jinja expressions with PRQL and instead create a dbt plugin that would allow to run dbt first, resolve jinja expressions and only then run PRQL.

The plugin is here dbt-core#5982.

aljazerzen avatar Dec 01 '22 11:12 aljazerzen