aws-genai-llm-chatbot
aws-genai-llm-chatbot copied to clipboard
Enable Metadata Filter Queries for OpenSearch
Enable Metadata Filter Queries for OpenSearch:
Implement Approximate KNN Search with Metadata Filtering in Amazon OpenSearch
Goal
- Implement self-querying or metadata filtering for approximate KNN search in Amazon OpenSearch
Motivation
- Customers often need to filter based on metadata depending on the use case. E.g. filter for a certain topic or business unit
- More data in one index makes it harder to match on semantic meaning as well as filter e.g. for entries that have been added on a specific date.
- Matching on metadata can be helpful to improve query performance as well as retrieval performance.
Approaches
UI Integration
- Allow end user to specify filters on metadata for an OpenSearch index
Query Rewriting by LLM
- Have the LLM itself rewrite the query before search execution
- Langchain self-query retriever is an example of how to do this https://python.langchain.com/docs/modules/data_connection/retrievers/self_query/
OpenSearch 2.9 on Amazon OpenSearch Service offers efficient vector query filtering with FAISS for fast metadata filtering during approximate k-NN search.
UI integration may best expose filters to users, while LLM query rewriting could enable natural language filtering to complement the UI.
Key questions:
- Which metadata fields should be filterable?
- How can the UI support both basic and advanced filtering?
- What performance issues emerge with heavy filtering?
- Should we start with one approach or both?
- Your thoughts on these questions and the direction are valuable. A PR would be welcome if you'd like to contribute. Let's refine the design and implementation together.
Share ideas or open a draft PR for feedback. Excited to see this progress!
This issue is stale because it has been open for 60 days with no activity.
This issue was closed because it has been inactive for 30 days since being marked as stale.