milvus icon indicating copy to clipboard operation
milvus copied to clipboard

[Enhancement]: New Search Architecture Based On Streaming Service

Open chyezh opened this issue 8 months ago • 10 comments

Is there an existing issue for this?

  • [x] I have searched the existing issues

What would you like to be added?

New Search Architecture Based On Streaming Service

Why is this needed?

Current Streaming Service supports a embedded querynode to implement search/query, also see #38399. But old delegator logic is too heavy for streaming node, and we cannot split the batch and streaming process based on current delegator arch completely:

  • Cannot merge the flush process and search search built process, so there're always double consuming from wal when recovery, double memory usage if collection is loaded.
  • Cannot put all meta management of growing data on streaming node and make a light weight history meta coordinator.
  • Cannot remove the forwarding RPC of the delegator to reduce streamingnode's work.

Here's the new distributed architecture for search/query process based on streaming service:

Image

The query process is implemented as shown in the diagram above, following a two-phase query approach:

  • Coordinator will generate global versioned query view and sync the view to all streaming node and query node, and keep consistency by a cross-node state machine.

  • QueryNode will subscribe the pure delete stream from the streaming node and apply the delete request to all segment on it.

  • Phase 1: The Proxy generates a shard-level query plan using the highest version of the QueryView from the StreamingNode:

    • Includes MVCC.
    • Query optimization (BM25, segment filtering, etc.).
    • Query view versioning.
  • Phase 2: The Proxy sends the query plan to both the StreamingNode and QueryNode:

    • The StreamingNode and QueryNode execute all query operations on their respective segments based on the specified view version (similar to the current SearchSegments process, but using version numbers instead of a segment list).
  • Final Step: The Proxy reduces all results and returns them to the user.

During this process, if a node crashes or the view becomes invalid, the process is canceled and the query operation is retried.

Here's the TODO list:

  • [ ] Versioned Data View to keep a view of historical data on a shard.
  • [ ] Versioned Query View to keep a distributed loaded info of a loaded shard.
    • #40467
    • #40518
    • #40521
  • [ ] New balancer to generate query view automatically by current cluster info.
  • [ ] Cross node state machine to keep consistency query view between streaming node, query node and coord.
  • [ ] Pure delete stream subscription start from any checkpoint.
  • [ ] Segment loader scheduler on query node to act with query view.
  • [ ] Client of new search architecture implementation for proxy.
  • [ ] Server of new search architecture implementation for qn and sn.

Anything else?

No response

chyezh avatar Mar 07 '25 06:03 chyezh

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Apr 10 '25 00:04 stale[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar May 30 '25 07:05 stale[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Jul 09 '25 02:07 stale[bot]

/reopen

chyezh avatar Jul 09 '25 07:07 chyezh

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Aug 08 '25 15:08 stale[bot]

/reopen

chyezh avatar Aug 11 '25 01:08 chyezh

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Sep 11 '25 01:09 stale[bot]

/reopen

chyezh avatar Sep 11 '25 03:09 chyezh

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Nov 10 '25 04:11 stale[bot]

/reopen

chyezh avatar Nov 10 '25 11:11 chyezh