llama_index icon indicating copy to clipboard operation
llama_index copied to clipboard

[Question]: Hybrid search?

Open chengyjonathan opened this issue 10 months ago • 4 comments

Question Validation

  • [X] I have searched both the documentation and discord for an answer.

Question

Hi there!

I'm wondering where would be the best place to get started if I wanted to implement a lexical, semantic, graph search hybrid.

I see the examples for doing lexical + semantic retrieval.

And I was wondering if there are any guides for doing semantic + kg retrieval.

chengyjonathan avatar Mar 28 '24 17:03 chengyjonathan

Hey @chengyjonathan! Great to see you venturing into new territories again 🚀

Based on the information provided in the LlamaIndex codebase, it appears that the framework already supports a hybrid search mechanism involving lexical, semantic, and knowledge graph retrieval. This is primarily facilitated through the KGTableRetriever class, where the retriever_mode parameter can be set to KGRetrieverMode.HYBRID, indicating a hybrid approach combining both keywords and embeddings to find relevant triplets in a knowledge graph.

The hybrid search functionality is implemented in the _retrieve method of the KGTableRetriever class. This method uses both lexical search (via keywords) and semantic search (via embeddings) to retrieve relevant nodes from the knowledge graph. If the retriever_mode is set to HYBRID, the method combines results from both lexical and semantic searches to enhance the retrieval process.

Here's a brief overview of how the hybrid search works:

  1. Keyword Extraction: The method starts by extracting keywords from the query string using the _get_keywords method, which leverages a language model for keyword extraction.

  2. Lexical Search: For each extracted keyword, it searches the knowledge graph for relevant nodes using the keyword search functionality (self._index_struct.search_node_by_keyword). This represents the lexical aspect of the search.

  3. Semantic Search: If embeddings are used (self._retriever_mode != KGRetrieverMode.KEYWORD), the method computes the query embedding using the provided embedding model (self._embed_model.get_text_embedding). It then retrieves relevant text snippets based on their semantic similarity to the query embedding, using the embeddings stored in self._index_struct.embedding_dict.

  4. Combining Results: If the retriever_mode is HYBRID, the method combines nodes found through both lexical and semantic searches. It ensures uniqueness of the results and prioritizes them based on relevance.

  5. Final Node Retrieval: The method retrieves the final set of nodes based on the combined results from both searches, along with their associated text snippets if include_text is set to True.

You can find this implementation in the llama-index-core/llama_index/core/indices/knowledge_graph/retrievers.py file.

I hope this helps! If you have any further questions or need more clarification, feel free to ask.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

dosubot[bot] avatar Mar 28 '24 17:03 dosubot[bot]

I think any custom retriever could do this. Or using the query fusion retriever and passing in all the retrievers you want to mix

https://docs.llamaindex.ai/en/stable/examples/retrievers/relative_score_dist_fusion/?h=query+fusion

logan-markewich avatar Mar 28 '24 18:03 logan-markewich

I think any custom retriever could do this. Or using the query fusion retriever and passing in all the retrievers you want to mix

https://docs.llamaindex.ai/en/stable/examples/retrievers/relative_score_dist_fusion/?h=query+fusion

Forgive me if this is wrong, but would I need two separate indexes? One composed of kg triplets?

Or could I just mix bm25, embedding based, and a kg based retriever?

chengyjonathan avatar Mar 28 '24 18:03 chengyjonathan

@chengyjonathan 3 retrievers, that then get composed into one. How you get those retrievers is up to you 👍🏻

logan-markewich avatar Mar 28 '24 18:03 logan-markewich