chdb Proposal for Implementing Streaming Query Support in chDB

Proposal for Implementing Streaming Query Support in chDB

Open wudidapaopao opened this issue 7 months ago • 1 comments

Currently, chDB executes queries by fetching the entire result set at once through the query_conn interface. This approach may lead to high memory usage and latency for large datasets. To address this, we propose adding streaming query capabilities to chDB.

The existing LocalServer in chDB initializes the execution engine via Connection::sendQuery and retrieves all results in one go using receiveResult, storing them in WriteBufferFromVector.

Proposed Changes

chDB Interface Modifications New send_query Interface: Introduce a send_query method to initialize a streaming query. This method returns a stream_local_result object with a fetch method. fetch Method in stream_local_result: Each call to fetch returns a single row (or a chunk) in the specified format (e.g., JSON, Arrow), enabling incremental data consumption.
LocalServer (ClientBase) Adjustments Deferred Result Retrieval: During the first initialization, only call Connection::sendQuery to set up the execution engine without fetching results immediately. On-Demand receiveResult Calls: When fetch is invoked, trigger receiveResult to retrieve a chunk of data. Once the chunk is exhausted, call receiveResult again for the next chunk. Handling Blocking: If receiveResult is not called for an extended period, the execution engine may block.

The proposal can also address https://github.com/chdb-io/chdb/issues/265

Apr 15 '25 14:04 wudidapaopao

BTW, https://github.com/timeplus-io/proton is an implementation of streaming SQL engine (like Apache Flink), using ClickHouse codebase. New results can be pushed to client via HTTP/TCP

Apr 16 '25 01:04 jovezhong

chdb chdb copied to clipboard

Proposal for Implementing Streaming Query Support in chDB​

chdb
chdb copied to clipboard

Proposal for Implementing Streaming Query Support in chDB