chdb
chdb copied to clipboard
Proposal for Implementing Streaming Query Support in chDB
Currently, chDB executes queries by fetching the entire result set at once through the query_conn interface. This approach may lead to high memory usage and latency for large datasets. To address this, we propose adding streaming query capabilities to chDB.
The existing LocalServer in chDB initializes the execution engine via Connection::sendQuery and retrieves all results in one go using receiveResult, storing them in WriteBufferFromVector.
Proposed Changes
- chDB Interface Modifications New send_query Interface: Introduce a send_query method to initialize a streaming query. This method returns a stream_local_result object with a fetch method. fetch Method in stream_local_result: Each call to fetch returns a single row (or a chunk) in the specified format (e.g., JSON, Arrow), enabling incremental data consumption.
- LocalServer (ClientBase) Adjustments Deferred Result Retrieval: During the first initialization, only call Connection::sendQuery to set up the execution engine without fetching results immediately. On-Demand receiveResult Calls: When fetch is invoked, trigger receiveResult to retrieve a chunk of data. Once the chunk is exhausted, call receiveResult again for the next chunk. Handling Blocking: If receiveResult is not called for an extended period, the execution engine may block.
The proposal can also address https://github.com/chdb-io/chdb/issues/265
BTW, https://github.com/timeplus-io/proton is an implementation of streaming SQL engine (like Apache Flink), using ClickHouse codebase. New results can be pushed to client via HTTP/TCP