seafowl
seafowl copied to clipboard
Add support for querying other databases with connectorx/sqlx
ConnectorX: SQL-query-to-Arrow support for a bunch of databases (https://github.com/sfu-db/connector-x/blob/main/connectorx/src/sources/mod.rs)
Used by ROAPI roo (https://github.com/roapi/roapi/blob/main/columnq/src/table/database.rs#L28-L43) but it seems to load the whole query result into memory. We want to use the TableProvider abstraction instead, to support being able to load PostgreSQL/MySQL/SQLite into Seafowl in a memory-limited environment (the standard CREATE EXTERNAL TABLE + CREATE TABLE AS sequence)
- [x] Evaluate whether using connectorx is worth the extra overhead of transitive dependencies (maybe there's some other library that builds on top of sqlx, maybe the DB drivers pulled in by sqlx are the same ones as used by connectorx)
- [x] Build a
TableProvider
implementation that can stream query results from connectorx (probably a single DataFusion partition) - [x] Add
CREATE EXTERNAL TABLE
support for PostgreSQL/MySQL/SQLite/etc- invent a syntax for specifying a query for the external table / specifying an existing table
- [ ] Update the reference for
CREATE EXTERNAL TABLE
- [ ] Write guides (loading/querying PostgreSQL / SQLite into Seafowl)
- [ ] (stretch goal) compare performance of querying through DataFusion against vanilla PostgreSQL, see if there is a speedup, compare against https://duckdb.org/2022/09/30/postgres-scanner.html
- [ ] Investigate possibilities for supporting advanced data types (e.g. Postgress arrays and JSON fields)
- [ ] Investigate approaches for escaping column/schema names in connector-X queries automatically