prql
prql copied to clipboard
Lutra roadmap
What's up?
After the initial implementation in #4134 has merged, there is still a lot of things to implement for Lutra. Tasks in order of priority:
- [x] Python bindings, so the results of queries can be use in with Pandas/Polars (underway in #4174, I need help with devops),
- [ ] test the CLI,
- [ ] reuse connections between executed queries (easy),
- [ ] define a proper lutra error type (medium),
- [x] generate database module definition (hard, requires connector_arrow support) #4182
- [ ] support for
@lutra.duckdb
(medium), - [ ] support for
@lutra.postgres
(medium), - [ ] CLI execute option to write results to a dir,
- [ ] use a connection pool to make execution parallel (medium),
- [ ] reuse connections between lutra invocations (requires daemon, hard),
- [ ] support for multi-database queries,
Regarding WASM support: it is limited by upstream library support. Currently no data source libraries used by connector_arrow compile for wasm32-unknown-unknown, but it seems like rusqlite is close.
When that is done, lutra still won't compile as it needs access to a file system to discover the project. I've specifically made sure that discover
module is standalone and could be hidden behind a feature. In this configuration, we could make a js library lutra-wasm
that accepts already-discovered project that is stored somewhere else in browser memory.
Overall, great!
When that is done, lutra still won't compile as it needs access to a file system to discover the project. I've specifically made sure that
discover
module is standalone and could be hidden behind a feature. In this configuration, we could make a js librarylutra-wasm
that accepts already-discovered project that is stored somewhere else in browser memory.
Yes! Or a separate function could do the collection and pass to lutra-wasm
as a string...
I'm not sure how lutra works, but am I correct in assuming that it automatically recognizes the schema of tables? Do you have plans to experiment with behaviors that would not be possible without the schema like #3133?
It does have this capability (see https://github.com/aljazerzen/connector_arrow/blob/main/connector_arrow/src/api.rs for what connector_arrow supports) and my approach of passing this information to the prqlc is to generate type definitions in PRQL.
Naive approach would be for lutra to pass schema information to prqlc directly in some internal representation, but that would couple lutra and prqlc very tightly. Instead, I added pull-schema command
(I don't have a better link for examples, I need to add CLI tests) to lutra, and prqlc can then work with type definitions directly.
TLDR; lutra will allow pulling schema into PRQL source, which will avoid some compiler problems, which will allow us to say "in this case, compiler may error out and say it needs more schema info".
I think that approach seems like the best choice given the current PRQL behavior of working without a schema and compiling to substrait, etc., which would not be possible without schema information. Great job!