Feature: Vector search recall accurate rate observation

Open 634750802 opened this issue 1 year ago • 1 comments

Filter Vector Search Calls
- Identify and filter vector search calls that utilize a vector index.
- Apply a sampling rate (e.g., 1 out of every 10 calls) to selectively choose the vector search calls for further analysis.
Generate and Execute New Vector Search Calls
- Create several new vector search calls based on the filtered data.
- Configure these new search calls with:
  - A larger LIMIT (topN): Increase the number of results returned to better analyze the search performance.
  - TiKV (full scan): Execute the queries using TiKV to perform a complete scan, ensuring all relevant data is retrieved.
Perform Calculations and Save Data
- Compute necessary metrics and collect relevant information from the search results for each sampled vector search call.
- Save the following information into a specified database for future analysis:
  - Text: The input text or query.
  - Limit: The specified result limit (topN).
  - Type: The type of operation, which can either be a vector-based or graph-based search.
  - Embedding: The vector representation of the input query.
  - Recall Accuracy Rate: A measurement of the accuracy of the results based on the recall rate.
  - Chunks Metadata: Metadata about the chunks (fragments) of data retrieved during the search.
  - Expected Chunks Metadata: Predefined or anticipated metadata about the chunks for comparison.
  - Knowledge Base ID: The identifier for the relevant knowledge base being searched.
  - Timestamp: The time the observation data was recorded.

Dec 11 '24 09:12 634750802

@Icemap Does Ragas have the corresponding metric, and should we also include this metric as part of the evaluation?

Dec 12 '24 05:12 Mini256