opendal icon indicating copy to clipboard operation
opendal copied to clipboard

new feature: Alternate approach for Databases

Open chitralverma opened this issue 6 months ago • 5 comments

Feature Description

High-performance database integration for OpenDAL using connector-x and connector-arrow.

This brings Arrow-native, zero-copy I/O capabilities to OpenDAL’s data access layer, enabling fast interaction with databases like PostgreSQL, MySQL, SQLite, DuckDB, BigQuery, MSSQL, Oracle etc.

Problem and Solution

Problem Statement

OpenDAL currently provides a minimal interface for SQL database interaction, often relying on custom SQL strings with manual string interpolation. This approach:

  • Lacks type safety and schema awareness
  • Is prone to SQL injection if not used carefully
  • Cannot leverage Arrow-based compute engines or data interoperability
  • Does not support zero-copy high-performance data transfers
  • Limits scalability for large result sets or analytical workloads

As a result, users cannot currently use OpenDAL as a performant, Arrow-native abstraction for databases in the same way they can for object storage, filesystems, or cloud-native sources.

Proposed Solution

To address this, I propose a new DatabaseAccessor (or similar) for OpenDAL that internally integrates one of the following:

  • connector-x: A Rust-native, Arrow-based connector that supports eager data loading from many databases into arrow::RecordBatches or arrow2 tables with parallelized reads. Polars also relies on connector-x for database interaction.
  • connector_arrow: A zero-copy, Arrow streaming abstraction for database query execution, allowing efficient lazy reads as Arrow Streams. This is inspired from connector-x but focuses on being rust only implementation with minimum dependency footprint.

With this integration, OpenDAL can provide:

  • Eager and Lazy Arrow I/O from SQL databases
  • Support for PostgreSQL, MySQL, DuckDB, SQLite, BigQuery, MSSQL, Oracle, and more
  • Schema introspection and optimized record batching. Record batch can work with the next(..) on reader and writer.
  • Consistent interface across object store/ database services
  • Clean separation of concerns between storage orchestration (OpenDAL) and query execution (ConnectorX/Arrow)

Additional Context

what do you think ?

Are you willing to contribute to the development of this feature?

  • [x] Yes, I am willing to contribute to the development of this feature.

chitralverma avatar May 30 '25 06:05 chitralverma

This idea sounds interesting, but I feel it doesn't align with OpenDAL's vision. It feels more like a separate project focused on structural data areas.

Xuanwo avatar May 30 '25 09:05 Xuanwo

yes, i thought so, just wanted to check with maintainers.

maybe this can be one of the "integrations" in a separate repo like oli, etc.

chitralverma avatar May 30 '25 11:05 chitralverma

OpenDAL uses sqlx to connect to the databases,so your advice likes change the opendal service's client interface.

This work need't a new database layer.

Thank you very much!

I-am-Li-Ren avatar Jun 01 '25 13:06 I-am-Li-Ren

https://github.com/apache/opendal/issues/853

I-am-Li-Ren avatar Jun 01 '25 13:06 I-am-Li-Ren

Maybe you need to provide a example to show your idea's value?

I-am-Li-Ren avatar Jun 01 '25 14:06 I-am-Li-Ren