dbt-core icon indicating copy to clipboard operation
dbt-core copied to clipboard

A very-WIP implementation of the PRQL plugin, for discussion

Open max-sixty opened this issue 3 years ago • 6 comments
trafficstars

As discussed with some folks in https://github.com/dbt-labs/dbt-core/discussions/5796 & offline

This is a very WIP implementation of the PRQL plugin. I'm not overjoyed about the code I've written, but I figured it was better to get it out in the open.

There are lots of notes in the code; to summarize:

  • I was hoping to write something not overly generic, but at least modular.
  • This gets some of the way there — it takes the "API" rather than "Jinja" path for most of it, as discussed in #5796.
  • But then eventually passes the compiled model off to ModelParser as a SQL file.
  • I added some notes on a better design, but as discussed, I'd like to socialize this to ensure we're on the same page before investing further. I found the current model parser code quite coupled, and SQL is high in a very long inheritance tree, so making it elegant is not trivial. The python parser is impressive, but def bolted on!
  • This code has unit tests but not an integration test (I also haven't hand-tested it end-to-end; I suspect there may be parts that don't work).
  • Note that it includes a file _dbt_prql.py that would be in dbt-prql, but I'm including here for the PR so we can iterate without coordinating dependencies. (That file also does a hack to emulate prqlc, the rust compiler that's bundled with prql-python / dbt-prql)

Thanks in advance for folks' feedback and guidance.

Description

Checklist

  • [x] I have read the contributing guide and understand what's expected of me
  • [x] I have signed the CLA
  • [ ] I have run this code in development and it appears to resolve the stated issue
  • [x] This PR includes tests, or tests are not required/relevant for this PR
  • [ ] I have opened an issue to add/update docs, or docs changes are not required/relevant for this PR
  • [ ] I have run changie new to create a changelog entry

max-sixty avatar Oct 02 '22 08:10 max-sixty

It's in a state where it would be useful to:

  • discuss the current proposed implementation
  • discuss the best path between here and a nice generic approach, balancing elegance with practicality
  • confirm we're all aligned on having PRQL including in some form

On balancing elegance with practicality — my biggest concern is making this very generic would involve quite some refactoring & inversion, if I understand the code correctly (very possibly I don't though, please correct me!)

Specifically, the ModelParser seems to be the most congruent place to implement different languages — so each language would define a SQLModelParser / PythonModelParser / etc, with listing references, replacing them, etc (as discussed in #5796). That would likely involve splitting the parts of ModelParser that are SQL-specific from those that are generic. But ModelParser's mro is:

  • dbt.parser.models.ModelParser
  • dbt.parser.base.SimpleSQLParser
  • dbt.parser.base.SQLParser
  • dbt.parser.base.ConfiguredParser
  • dbt.parser.base.Parser
  • dbt.parser.base.BaseParser

...so it's long, and SQL is quite deep. So hence the heavy refactor that I think would be required to do this in an ideal way.

Comparing that to the current python parsing — python parsing is implemented with an additional method on the ModelParser (thanks @jtcohen6 for the original pointer, that was v helpful in getting me started). The current prql implementation takes a slightly more modular approach, still branching within ModelParser, but then calls out to a PrqlLanguageProvider, before passing the SQL back to ModelParser. In the more generic proposal above, the PrqlLanguageProvider would become a PrqlModelParser, and be composed with the parts of ModelParser that were relevant.

Is that right? Is there an easier approach here? What would be the best path there, assuming we don't want to wait until it's perfect before merging anything?


I'm open minded on the way forward. As discussed with folks, I'm very keen to have people write PRQL in dbt — I think it would be great for PRQL to have the reach of dbt & its users, and it would be great for dbt & its users to have an ergonomic language option. Given PRQL's focus is the data transformation, rather than project management & DAG management, there is a uniquely good fit between the two. So I'm happy to drive this and iterate to get this in.

max-sixty avatar Oct 03 '22 07:10 max-sixty

Lots of interesting things here!

We did look at creating a different node type for Python nodes, but in the end decided not to do it because there a fair amount of complications. Was there some reason you decided to go that direction? It also feels a bit wrong to have references to PrqlNode, if the actual prql code doesn't exist in the repo. So I'm thinking we might need something a bit more pluggable, such that if somebody install a language provider, a new value for the 'language' attribute becomes available, and the code to compile it is found by the plugin architecture.

gshank avatar Oct 05 '22 19:10 gshank

Thanks for responding @gshank !

I very much agree with your assessment — there's a tradeoff between the "fair amount of complications" :) vs. having something pluggable.

In the currently proposed code, I avoided many of the complications, and so wasn't using the PrqlNode etc. I've just removed those for clarity.

At what level would you like to engage here? I realize these kinds of quite broad proposals can be difficult to connect with the lines of code. I've tried to summarize the current state & salient questions in prose above. Is that a reasonable starting point or do you prefer an alternative approach? Also happy to chat on Slack, or do another Zoom with the broader team if that lets us iterate faster.

max-sixty avatar Oct 06 '22 01:10 max-sixty

Thanks for the review @jtcohen6, very much agree with your comments. As discussed offline, let's touch base in a few weeks on the best path forward.

max-sixty avatar Nov 01 '22 21:11 max-sixty

It's great to see all the work that already went into making prql available in dbt in a convenient way! I'm currently developing dbt-ibis so that users can write dbt models as Ibis expressions and I've learned a lot from looking at dbt-prql and this PR. Thanks! Curious to see how this continues :)

binste avatar Sep 07 '23 15:09 binste

Is this dead?

wearpants avatar Feb 19 '24 18:02 wearpants

Is this dead?

I would love to revive it! There are 22 :+1:s, so there is a decent amount of community interest... If there were interest from the dbt team I'd be happy to make a significant time investment to make something that works.

My read is that the dbt team has other priorities at the moment — understandable — and so I'm not sure there is a viable path without that changing.

But if there is a path forward, let me know!


For any future travelers, the precursor to this was https://github.com/PRQL/dbt-prql. But that doesn't work with some dialects, and the implementation is extremely hacky, so we're not supporting it, unfortunately. There's a note stating this on the repo.

max-sixty avatar Feb 26 '24 06:02 max-sixty

@max-sixty Long time no talk!

You're right that we're focused on other priorities at the moment. The primary ones are adding unit testing (for SQL models) and stabilizing dbt-core's external interfaces (especially for adapter maintainers). There's more detail about this in our last roadmap post.

I still find this work fun & interesting. A number of the internals have changed since we sketched out the rough initial implementation over a year ago. I don't think we can do anything here before the v1.8 release (planned for April), but I would be up for taking another look after, once the dust has settled.

jtcohen6 avatar Feb 26 '24 15:02 jtcohen6

I’m here for it!

randypitcherii avatar Feb 29 '24 14:02 randypitcherii

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

github-actions[bot] avatar Mar 22 '24 23:03 github-actions[bot]