ragbuilder icon indicating copy to clipboard operation
ragbuilder copied to clipboard

GraphRAG - vector search

Open jexp opened this issue 1 year ago • 4 comments

Thanks for adding GraphRAG to RAGbuilder.

I had some questions and suggestions, perhaps you want to chat some time.

  • QQ: in graphrag.full_retriever you fetch the vector store data but don't use it in the method or the returns, looks redundant?
        def full_retriever(question: str):
            graph_data = graph_retriever(question)
            vector_data = [el.page_content for el in vector_retriever.invoke(question)]
            final_data = f'''Graph data:
        {graph_data}
            '''
            return final_data
  • You don't make use of the built in neo4j vector search only the fulltext index - with the vector search you can allow in-graph vector and hybrid search? (you can create vector indexes both for chunks in the lexical graph, for entities in the domain graph and for communities in the topical structures)
  • right now the graph retriever only uses the direct neighbourhood of the nodes, this could be a good hyperparameter to add
  • e.g. we have a number of different retrievers in the llm-graph-builder, see: https://github.com/neo4j-labs/llm-graph-builder/blob/DEV/backend/src/shared/constants.py
  • I saw you copied some code from the neo4j-langchain integrations? Was there a reason (i.e. did you make modifications - if so it might be good to discuss to rather contribute them back upstream?)
  • there is the option to run clustering algorithms to generate cross-document topic summaries across the entity graphs (like in the MSFT GraphRAG paper), see https://neo4j.com/developer-blog/global-graphrag-neo4j-langchain/ (we've also implemented that in https://llm-graph-builder.neo4jlabs.com if you have a graph data science enabled database).

We have documented more GraphRAG patterns, here just in case you want to share your RAG patterns to the catalogue or provide some feedback:

  • https://neo4j.com/developer-blog/graphrag-field-guide-rag-patterns/
  • https://graphr.rag

jexp avatar Oct 05 '24 22:10 jexp

Hi @jexp, thanks for your questions & thoughts!

@ashwinzyx - perhaps, you can take a look once you're back.

aravind10x avatar Oct 08 '24 04:10 aravind10x

Hi @jexp, thanks for looking at our repo. Apologies for the delay. Just got back from vacation.

  • QQ: in graphrag.full_retriever you fetch the vector store data but don't use it in the method or the returns, looks redundant?

     def full_retriever(question: str):
          graph_data = graph_retriever(question)
          vector_data = [el.page_content for el in vector_retriever.invoke(question)]
          final_data = f'''Graph data:
      {graph_data}
          '''
          return final_data
    

[Ans] Yes. Looks like we are not using vector_data for the Graph RAG but using it for the Hybrid RAG. Will remove it

  • You don't make use of the built in neo4j vector search only the fulltext index - with the vector search you can allow in-graph vector and hybrid search? (you can create vector indexes both for chunks in the lexical graph, for entities in the domain graph and for communities in the topical structures)

[Ans] We have been using Chroma for the templates for vector search. I do see hybrid search options in below examples. https://python.langchain.com/docs/integrations/vectorstores/neo4jvector/ https://neo4j.com/labs/genai-ecosystem/langchain/

believe below is using in-graph vector. Am i right? Is there an full example you can share for in-graph vector https://neo4j.com/developer-blog/global-graphrag-neo4j-langchain/

  • right now the graph retriever only uses the direct neighbourhood of the nodes, this could be a good hyperparameter to add

[Ans] For now we have added GraphRAG as a template. We will include these are individual components and have hyperparameter tuning option

e.g. we have a number of different retrievers in the llm-graph-builder, see: https://github.com/neo4j-labs/llm-graph-builder/blob/DEV/backend/src/shared/constants.py

[Ans] Thanks for the pointer. Will take a look

I saw you copied some code from the neo4j-langchain integrations? Was there a reason (i.e. did you make modifications - if so it might be good to discuss to rather contribute them back upstream?)

[Ans] No. We did not make any modifications.

there is the option to run clustering algorithms to generate cross-document topic summaries across the entity graphs (like in the MSFT GraphRAG paper), see https://neo4j.com/developer-blog/global-graphrag-neo4j-langchain/ (we've also implemented that in https://llm-graph-builder.neo4jlabs.com/ if you have a graph data science enabled database).

[Ans] Thanks. Will take a look.

Thanks for all your feedback. Would be great to chat sometime. We want the improve GraphRAG option in RAGBuilder and would love your contributions as well

ashwinzyx avatar Oct 16 '24 08:10 ashwinzyx

@jexp - can you pls review @ashwinzyx's comments? Do you have any further thoughts or suggestions? Please feel free to suggest changes or raise a PR to make the Graph RAG part of RAGBuilder even better.

aravind10x avatar Oct 18 '24 15:10 aravind10x

@aravind10x would probably good to have a chat with me and @tomasonjo at some point, harder to go through these in GH issues :)

jexp avatar Oct 24 '24 22:10 jexp