trustfall icon indicating copy to clipboard operation
trustfall copied to clipboard

Decoupling the query language from the query engine?

Open snth opened this issue 2 years ago • 2 comments

Hi,

I watched your HYTRADBOI 2022 talk and this looks super cool!

I'm wondering how closely linked the query language is to the query engine itself?

To me it seems like the real magic is your query engine and I would love to see that exposed through other query languages. In particular I'm a contributor to PRQL and would love to be able to run PRQL queries on your query engine. PRQL currently only supports SQL as a backend and plans to go beyond that are quite some way off atm but it's interesting to think about.

Where would you see the limitations of providing a SQL frontend (and hence also PRQL) to trustfall?

snth avatar Feb 02 '23 07:02 snth

Hi,

Thanks for watching the talk and for looking into Trustfall!

My stance on other frontends for the Trustfall engine is "YES PLEASE" 😁 My ultimate goal for Trustfall is to make it the LLVM of data: LLVM intermediates between programming languages and hardware, and Trustfall should intermediate between data use and data providers regardless of format, access pattern, storage mechanism, etc.

In light of the above, the language and engine today are somewhat tied together, but not particularly strongly so. I've been following the work on PRQL and I'd be interested in exploring what a PRQL/SQL frontend might look like. I suspect I won't have time to write code for it myself (I'm instead working on something that produces a literal 1000x speedup), but I'd be happy to support you or someone else taking point on it.

The key points I think would be worth keeping an eye on are:

  • Right now, the IR uses the GraphQL type system to represent types internally. This is something an early prototype of PRQL/SQL frontend should probably do too via a little translation layer from SQL types, and if successful we can in the future switch the IR to a more agnostic type system.
  • Trustfall only allows joins on pre-specified relations, whereas SQL (and presumably PRQL?) allow joining anything to anywhere else, and could in principle be used to write nonsensical joins like joining today's temperature to the scores of last night's basketball game. Users using Trustfall via PRQL/SQL may not be used to the system preventing them from doing an operation that has a well-defined result as relational algebra but is semantically nonsensical. I'd say this is more of a feature than a bug ... but some SQL users might disagree 😅
  • PRQL/SQL have some features that Trustfall today considers explicitly out of scope, like a general ORDER BY ability. Trustfall preserves the order of data entries produced by the underlying data provider, and Trustfall's joins are function-like in that they could be made to take an ordering parameter (resolved by the data provider). This functionality has been sufficient thus far, and giving up the general ORDER BY means we were able to implement other more important functionality instead (happy to go into more details, if they are ever useful). So supporting ORDER BY in PRQL/SQL executed via Trustfall may require an additional post-processing step, and I wouldn't recommend building it at first.

If hopping on a quick video call and walking through Trustfall queries or source code or anything else would be helpful as you decide how to move forward, I'd be happy to do that!

obi1kenobi avatar Feb 02 '23 16:02 obi1kenobi

Just following up — would you be interested in collaborating on a PRQL frontend for Trustfall? I've been too swamped with other work to properly look into it, but if you're open to working together on it, I'd be happy to join in!

obi1kenobi avatar Jun 30 '23 19:06 obi1kenobi