llama_index icon indicating copy to clipboard operation
llama_index copied to clipboard

[Question]: 【llamaindex multimodal propertygraphindex neo4j 】

Open GobinFan opened this issue 10 months ago • 1 comments

Question Validation

  • [x] I have searched both the documentation and discord for an answer.

Question

【llamaindex multimodal propertygraphindex neo4j 】How to implement a multimodal RAG based on propertygraphindex combined with neo4j for storing image vectors and text vectors, requiring the implementation of graph retrieval + vector retrieval ,thanks @jerryjliu

GobinFan avatar Feb 20 '25 10:02 GobinFan

Hey @GobinFan! I'm here to help you with any questions or issues you have. If you need assistance, feel free to ask!

To implement a multimodal RAG (Retrieval-Augmented Generation) system using a property graph index combined with Neo4j for storing image vectors and text vectors, you can follow these steps:

  1. Initialize Neo4j Graph Store: Connect to your Neo4j database using Neo4jGraphStore.

    neo4j_graph_store = Neo4jGraphStore(
        username=username,
        password=password,
        url=url,
        database=database,
    )
    
  2. Create Storage Context: Manage the storage of your graph data with StorageContext.

    neo4j_storage_context = StorageContext.from_defaults(
        graph_store=neo4j_graph_store
    )
    
  3. Index Creation: Create a KnowledgeGraphIndex from your documents, including embeddings for vector retrieval.

    neo4j_index = KnowledgeGraphIndex.from_documents(
        documents=docs,
        storage_context=neo4j_storage_context,
        max_triplets_per_chunk=10,
        include_embeddings=True,
    )
    
  4. Node Parsing: Use a SentenceSplitter to parse nodes from documents.

    node_parser = SentenceSplitter(chunk_size=512)
    nodes = node_parser(docs)
    
  5. Vector Index: Create a VectorStoreIndex for vector retrieval.

    vector_index = VectorStoreIndex(nodes=nodes)
    
  6. Query Engine Setup: For a combination of vector and graph retrieval, use RAW_VECTOR_KG_COMBO.

    if query_engine_type == Neo4jQueryEngineType.RAW_VECTOR_KG_COMBO:
        from llama_index.core.query_engine import RetrieverQueryEngine
    
        neo4j_vector_retriever = VectorIndexRetriever(index=vector_index)
        neo4j_kg_retriever = KGTableRetriever(
            index=neo4j_index, retriever_mode="keyword", include_text=False
        )
        neo4j_custom_retriever = CustomRetriever(
            neo4j_vector_retriever, neo4j_kg_retriever
        )
    
        neo4j_response_synthesizer = get_response_synthesizer(
            response_mode="tree_summarize"
        )
    
        self.query_engine = RetrieverQueryEngine(
            retriever=neo4j_custom_retriever,
            response_synthesizer=neo4j_response_synthesizer,
        )
    
  7. Custom Retriever: Implement a CustomRetriever that combines vector and graph retrieval.

    class CustomRetriever(BaseRetriever):
        def __init__(
            self,
            vector_retriever: VectorIndexRetriever,
            kg_retriever: KGTableRetriever,
            mode: str = "OR",
        ) -> None:
            self._vector_retriever = vector_retriever
            self._kg_retriever = kg_retriever
            if mode not in ("AND", "OR"):
                raise ValueError("Invalid mode.")
            self._mode = mode
    
        def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
            vector_nodes = self._vector_retriever.retrieve(query_bundle)
            kg_nodes = self._kg_retriever.retrieve(query_bundle)
    
            vector_ids = {n.node.node_id for n in vector_nodes}
            kg_ids = {n.node.node_id for n in kg_nodes}
    
            combined_dict = {n.node.node_id: n for n in vector_nodes}
            combined_dict.update({n.node.node_id: n for n in kg_nodes})
    
            if self._mode == "AND":
                retrieve_ids = vector_ids.intersection(kg_ids)
            else:
                retrieve_ids = vector_ids.union(kg_ids)
    
            return [combined_dict[rid] for rid in retrieve_ids]
    

This setup allows you to perform both graph-based and vector-based retrievals, enabling a multimodal RAG system using Neo4j [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

dosubot[bot] avatar Feb 20 '25 10:02 dosubot[bot]