aws-genai-llm-chatbot icon indicating copy to clipboard operation
aws-genai-llm-chatbot copied to clipboard

Enable Metadata Filter Queries for OpenSearch

Open philikai opened this issue 1 year ago • 1 comments

Enable Metadata Filter Queries for OpenSearch:

Implement Approximate KNN Search with Metadata Filtering in Amazon OpenSearch

Goal

  • Implement self-querying or metadata filtering for approximate KNN search in Amazon OpenSearch

Motivation

  • Customers often need to filter based on metadata depending on the use case. E.g. filter for a certain topic or business unit
  • More data in one index makes it harder to match on semantic meaning as well as filter e.g. for entries that have been added on a specific date.
  • Matching on metadata can be helpful to improve query performance as well as retrieval performance.

Approaches

UI Integration

  • Allow end user to specify filters on metadata for an OpenSearch index

Query Rewriting by LLM

  • Have the LLM itself rewrite the query before search execution
  • Langchain self-query retriever is an example of how to do this https://python.langchain.com/docs/modules/data_connection/retrievers/self_query/

philikai avatar Feb 26 '24 13:02 philikai

OpenSearch 2.9 on Amazon OpenSearch Service offers efficient vector query filtering with FAISS for fast metadata filtering during approximate k-NN search.

UI integration may best expose filters to users, while LLM query rewriting could enable natural language filtering to complement the UI.

Key questions:

  • Which metadata fields should be filterable?
  • How can the UI support both basic and advanced filtering?
  • What performance issues emerge with heavy filtering?
  • Should we start with one approach or both?
  • Your thoughts on these questions and the direction are valuable. A PR would be welcome if you'd like to contribute. Let's refine the design and implementation together.

Share ideas or open a draft PR for feedback. Excited to see this progress!

ystoneman avatar Apr 13 '24 19:04 ystoneman

This issue is stale because it has been open for 60 days with no activity.

github-actions[bot] avatar Jun 13 '24 01:06 github-actions[bot]

This issue was closed because it has been inactive for 30 days since being marked as stale.

github-actions[bot] avatar Jul 13 '24 01:07 github-actions[bot]