seafowl icon indicating copy to clipboard operation
seafowl copied to clipboard

Add support for querying other databases with connectorx/sqlx

Open mildbyte opened this issue 2 years ago • 0 comments

ConnectorX: SQL-query-to-Arrow support for a bunch of databases (https://github.com/sfu-db/connector-x/blob/main/connectorx/src/sources/mod.rs)

Used by ROAPI roo (https://github.com/roapi/roapi/blob/main/columnq/src/table/database.rs#L28-L43) but it seems to load the whole query result into memory. We want to use the TableProvider abstraction instead, to support being able to load PostgreSQL/MySQL/SQLite into Seafowl in a memory-limited environment (the standard CREATE EXTERNAL TABLE + CREATE TABLE AS sequence)

  • [x] Evaluate whether using connectorx is worth the extra overhead of transitive dependencies (maybe there's some other library that builds on top of sqlx, maybe the DB drivers pulled in by sqlx are the same ones as used by connectorx)
  • [x] Build a TableProvider implementation that can stream query results from connectorx (probably a single DataFusion partition)
  • [x] Add CREATE EXTERNAL TABLE support for PostgreSQL/MySQL/SQLite/etc
    • invent a syntax for specifying a query for the external table / specifying an existing table
  • [ ] Update the reference for CREATE EXTERNAL TABLE
  • [ ] Write guides (loading/querying PostgreSQL / SQLite into Seafowl)
  • [ ] (stretch goal) compare performance of querying through DataFusion against vanilla PostgreSQL, see if there is a speedup, compare against https://duckdb.org/2022/09/30/postgres-scanner.html
  • [ ] Investigate possibilities for supporting advanced data types (e.g. Postgress arrays and JSON fields)
  • [ ] Investigate approaches for escaping column/schema names in connector-X queries automatically

mildbyte avatar Oct 03 '22 09:10 mildbyte