llama_index
llama_index copied to clipboard
[Feature Request]: elasticsearch retriever wanted
Feature Description
i wanna use elasticsearch retriever to do keyword retrieve, what do i need? any existing Retriever suggested?
Reason
any existing Retriever suggested?
Value of Feature
No response
Hey @dl942702882, great to see you back! 🚀
Based on the context provided, the LlamaIndex library has a class called ElasticsearchStore that can be used for keyword retrieval in Elasticsearch. This class is designed to connect to an Elasticsearch instance and supports both vector and keyword search.
Here are the key points and requirements for implementing keyword retrieval using this approach:
-
Elasticsearch Version: The
ElasticsearchStoreclass requires Elasticsearch 8.9.0 or higher. This is crucial as the features and API compatibility might differ in earlier versions. -
Dependencies: You need to install the
llama-index-vector-stores-elasticsearchandllama-indexpackages. These can be installed via pip as shown below:%pip install llama-index-vector-stores-elasticsearch !pip install llama-index -
Elasticsearch Setup: You can set up Elasticsearch either locally using Docker or through Elastic Cloud. For local setup, the provided Docker command simplifies the process. For Elastic Cloud, you need to sign up for a free trial and configure the connection using the cloud ID, and optionally, an API key, username, and password.
-
Connecting to Elasticsearch: The
ElasticsearchStoreclass requires parameters such asindex_nameand optionallyes_url,es_cloud_id,es_api_key,es_user, andes_passwordfor connecting to an Elasticsearch instance. Examples for both local and cloud connections are provided. -
Keyword Search Implementation: While the context primarily focuses on setting up and connecting to Elasticsearch, implementing keyword search would typically involve using Elasticsearch's query DSL to perform searches on text fields. The
ElasticsearchStoreclass likely abstracts some of these details, but specific methods for keyword search are not detailed in the provided context. -
Custom Filters and Queries: The context mentions the ability to use custom filters and override the query function for more advanced search capabilities, including keyword searches. This suggests that for keyword retrieval, one might need to customize the query sent to Elasticsearch to match specific keywords.
For keyword search, the ElasticsearchStore class provides two methods: query for synchronous keyword search and aquery for asynchronous keyword search. Here are examples of how to use these methods:
-
For
query:res = es_store.query( VectorStoreQuery( query_str="lorem", mode=VectorStoreQueryMode.TEXT_SEARCH, similarity_top_k=1, ) ) -
For
aquery:res = await es_store.aquery( VectorStoreQuery( query_str="lorem", mode=VectorStoreQueryMode.TEXT_SEARCH, similarity_top_k=1, ) )
You can find more details about these methods in the test_vector_stores_elasticsearch.py file.
Please note that the ElasticsearchStore class is part of the llama-index-vector-stores-elasticsearch package, which is a separate package from the main llama-index package. Make sure to install both packages to use this class.
I hope this information helps you in implementing keyword retrieval in your project. If you have any more questions or need further clarification, feel free to ask.
Sources
- docs/docs/examples/vector_stores/ElasticsearchIndexDemo.ipynb
- llama-index-integrations/storage/index_store/llama-index-storage-index-store-elasticsearch/llama_index/storage/index_store/elasticsearch/base.py
- llama-index-integrations/vector_stores/llama-index-vector-stores-elasticsearch/tests/test_vector_stores_elasticsearch.py
About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
use elasticsearch not for vector search, but use its inverting index for keyword retrieve
How about setting alpha=0.0?
query_engine = index.as_query_engine(
vector_store_query_mode="hybrid", similarity_top_k=2, alpha=0.0
)
@dl942702882 From version 0.2.0 of llama-index-vector-stores-elasticsearch you can do keyword retrieval, see https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/vector_stores/ElasticsearchIndexDemo.ipynb
@maxjakob 您好,请问这个必须要求ES的高级版才可以使用是吗,我更新了llama-index-vector-stores-elasticsearch,在混合搜索的时候 ,一直出错“elasticsearch.AuthorizationException: AuthorizationException(403, 'security_exception', 'current license is non-compliant for [Reciprocal Rank Fusion (RRF)]')”
@maxjakob 您好,请问这个必须要求ES的高级版才可以使用是吗,我更新了llama-index-vector-stores-elasticsearch,在混合搜索的时候 ,一直出错“elasticsearch.AuthorizationException: AuthorizationException(403, 'security_exception', 'current license is non-compliant for [Reciprocal Rank Fusion (RRF)]')”
You can also use a 30-day trial version of ES with xpack.license.self_generated.type=trial.
See https://docs.llamaindex.ai/en/stable/examples/vector_stores/ElasticsearchIndexDemo/
@yzgrfsy Keyword-only retrieval (BM25) is a regular feature of Elasticsearch. Hybrid search (dense vector + BM25 retrieval, combining the results using RRF) is a licensed feature that you can trial with the parameter that @adrian539859 shared or with a trial in Elastic Cloud.
@dl942702882 thank you for your answer。
@maxjakob thank you for your answer。我实现创建了index到ES中,现在按照LLamaindex的官网指导 https://docs.llamaindex.ai/en/stable/examples/retrievers/bm25_retriever/ 创建自定义的检索器(结合矢量检索器和 BM25 检索器),实现 Hybrid Retriever :
class HybridRetriever(BaseRetriever): def init(self, vector_retriever, bm25_retriever): self.vector_retriever = vector_retriever self.bm25_retriever = bm25_retriever super().init()
def _retrieve(self, query, **kwargs):
bm25_nodes = self.bm25_retriever.retrieve(query, **kwargs)
vector_nodes = self.vector_retriever.retrieve(query, **kwargs)
# combine the two lists of nodes
all_nodes = []
node_ids = set()
for n in bm25_nodes + vector_nodes:
if n.node.node_id not in node_ids:
all_nodes.append(n)
node_ids.add(n.node.node_id)
return all_nodes
但是我遇到一个问题,我需要实例化两个检索器。其中:vector_retriever 是这样实例化: vector_store = ElasticsearchStore( index_name=es_index_name, es_url=CONS_ELASTIC_SEARCH_URL )
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_vector_store(embed_model=Settings.embed_model, vector_store=vector_store,
storage_context=storage_context)
vector_retriever = index.as_retriever(similarity_top_k=10) 。但是:实例化bm25_retriever bm25_retriever = BM25Retriever.from_defaults(index=index, similarity_top_k=10)或者 bm25_retriever = BM25Retriever.from_defaults(nodes=nodes, similarity_top_k=10)均出错,请问您知道如何从已保存index的ES中得到nodes 吗
@yzgrfsy There is a simpler way to do hybrid retrieval, see the documentation:
from llama_index.vector_stores.elasticsearch import AsyncDenseVectorStrategy
hybrid_store = ElasticsearchStore(
es_url="http://localhost:9200",
index_name="xyz",
retrieval_strategy=AsyncDenseVectorStrategy(hybrid=True),
)
storage_context = StorageContext.from_defaults(vector_store=hybrid_store)
index = VectorStoreIndex(nodes, storage_context=storage_context)
retriever = index.as_retriever()
print(retriever.retrieve(query))
This will use dense vector retrieval and BM25 and combine the results using RRF.