opendal
opendal copied to clipboard
new feature: Alternate approach for Databases
Feature Description
High-performance database integration for OpenDAL using connector-x and connector-arrow.
This brings Arrow-native, zero-copy I/O capabilities to OpenDAL’s data access layer, enabling fast interaction with databases like PostgreSQL, MySQL, SQLite, DuckDB, BigQuery, MSSQL, Oracle etc.
Problem and Solution
Problem Statement
OpenDAL currently provides a minimal interface for SQL database interaction, often relying on custom SQL strings with manual string interpolation. This approach:
- Lacks type safety and schema awareness
- Is prone to SQL injection if not used carefully
- Cannot leverage Arrow-based compute engines or data interoperability
- Does not support zero-copy high-performance data transfers
- Limits scalability for large result sets or analytical workloads
As a result, users cannot currently use OpenDAL as a performant, Arrow-native abstraction for databases in the same way they can for object storage, filesystems, or cloud-native sources.
Proposed Solution
To address this, I propose a new DatabaseAccessor (or similar) for OpenDAL that internally integrates one of the following:
- connector-x: A Rust-native, Arrow-based connector that supports eager data loading from many databases into arrow::RecordBatches or arrow2 tables with parallelized reads. Polars also relies on connector-x for database interaction.
- connector_arrow: A zero-copy, Arrow streaming abstraction for database query execution, allowing efficient lazy reads as Arrow Streams. This is inspired from connector-x but focuses on being rust only implementation with minimum dependency footprint.
With this integration, OpenDAL can provide:
- Eager and Lazy Arrow I/O from SQL databases
- Support for PostgreSQL, MySQL, DuckDB, SQLite, BigQuery, MSSQL, Oracle, and more
- Schema introspection and optimized record batching. Record batch can work with the
next(..)on reader and writer. - Consistent interface across object store/ database services
- Clean separation of concerns between storage orchestration (OpenDAL) and query execution (ConnectorX/Arrow)
Additional Context
what do you think ?
Are you willing to contribute to the development of this feature?
- [x] Yes, I am willing to contribute to the development of this feature.
This idea sounds interesting, but I feel it doesn't align with OpenDAL's vision. It feels more like a separate project focused on structural data areas.
yes, i thought so, just wanted to check with maintainers.
maybe this can be one of the "integrations" in a separate repo like oli, etc.
OpenDAL uses sqlx to connect to the databases,so your advice likes change the opendal service's client interface.
This work need't a new database layer.
Thank you very much!
https://github.com/apache/opendal/issues/853
Maybe you need to provide a example to show your idea's value?