autoflow icon indicating copy to clipboard operation
autoflow copied to clipboard

Feature: Vector search recall accurate rate observation

Open 634750802 opened this issue 1 year ago • 1 comments

  1. Filter Vector Search Calls

    • Identify and filter vector search calls that utilize a vector index.
    • Apply a sampling rate (e.g., 1 out of every 10 calls) to selectively choose the vector search calls for further analysis.
  2. Generate and Execute New Vector Search Calls

    • Create several new vector search calls based on the filtered data.
    • Configure these new search calls with:
      • A larger LIMIT (topN): Increase the number of results returned to better analyze the search performance.
      • TiKV (full scan): Execute the queries using TiKV to perform a complete scan, ensuring all relevant data is retrieved.
  3. Perform Calculations and Save Data

    • Compute necessary metrics and collect relevant information from the search results for each sampled vector search call.
    • Save the following information into a specified database for future analysis:
      • Text: The input text or query.
      • Limit: The specified result limit (topN).
      • Type: The type of operation, which can either be a vector-based or graph-based search.
      • Embedding: The vector representation of the input query.
      • Recall Accuracy Rate: A measurement of the accuracy of the results based on the recall rate.
      • Chunks Metadata: Metadata about the chunks (fragments) of data retrieved during the search.
      • Expected Chunks Metadata: Predefined or anticipated metadata about the chunks for comparison.
      • Knowledge Base ID: The identifier for the relevant knowledge base being searched.
      • Timestamp: The time the observation data was recorded.

634750802 avatar Dec 11 '24 09:12 634750802

@Icemap Does Ragas have the corresponding metric, and should we also include this metric as part of the evaluation?

Mini256 avatar Dec 12 '24 05:12 Mini256