graphrag
graphrag copied to clipboard
Add graphrag_import_neo4j_cypher Notebook
Description
Merge PR created for #544 Kudos to @jexp
Related Issues
#418 #433
Checklist
- [ ] I have tested these changes locally.
- [ ] I have reviewed the code changes.
- [ ] I have updated the documentation (if necessary).
- [ ] I have added appropriate unit tests (if applicable).
Thanks @AlonsoGuevara
Hey @AlonsoGuevara. I was testing the graphrag_import_neo4j_cypher.ipynb notebook (curently in the branch community/graphrag_import_neo4j_cypher) importing my own parquet files generated with GraphRAG into my NEO4J instance. I could run most of the initial cells and it looks it's importing properly, but then, when loading the "COMMUNITY REPORT" data, it failed like this:
The error mesage is:
"ConstraintError: {code: Neo.ClientError.Schema.ConstraintValidationFailed} {message: Node(55) already exists with label __Community__ and property community = '0'}"
Not sure if that ID should be "Community_Report" instead of "Community" because that was already imported in the previous cell? Not sure, it might be a different reason... in any case, the notebook is failing in that cell.
Btw, I'm able to load all my test PARQUET files into "graphrag-visualizer" which worked right away: https://noworneverev.github.io/graphrag-visualizer/
However, I want to visualize with NEO4J because it looks more flexible and customizable. I want to be able to filter by certain flags/types marked on the entities, etc.
Do you know how to fix this issue/error? Or is it an issue with the notebook?
I'm attaching my parquet file (as mentioned they work on "graphrag-visualizer") so anyone can repro with this notebook for NEO4J. test-output-parquet-files.zip
Hmm, I guess we have to decide what is/are the primary keys for Community Reports, currently we have both community and id. And it seems that there is a conflict here that the same id is using different community.
Would be good to clarify which of the two is actually identifying a community uniquely.
Adding @tomasonjo fix from #999
Thanks for your contribution!
can you please provide retrieval notebooks as well, considering global and local search?
@k2ai Once it's in the NEo4j database, I'm using Neo4j desktop or in the cloud.
What kind of "retrieval notebooks" are you thinking that it'd be useful?
This is how I use it in Neo4J desktop, note the query in there, etc. so it's dynamic:
Notebooks like global_search and local_search are in example folders where we can directly ask questions using normal text instead of cypher query.
Like this:
k2ai - Understand, but that's unrelated to Neo4j, right? It's done with GraphRAG outputs and feeding back the GPT model. You won't use anything from Neo4j for that, right? - My point is that it's a different path and usage as compared to Neo4j, am I wrong?
You can look here for some inspiration for local and global search: https://towardsdatascience.com/integrating-microsoft-graphrag-into-neo4j-e0d4fa00714c
V sre., 28. avg. 2024, 21:13 je oseba Cesar De la Torre < @.***> napisala:
k2ai - Understand, but that's unrelated to Neo4j, right? It's done with GraphRAG outputs and feeding back the GPT model. You won't use anything from Neo4j for that, right? - My point is that it's a different path and usage as compared to Neo4j, am I wrong?
— Reply to this email directly, view it on GitHub https://github.com/microsoft/graphrag/pull/593#issuecomment-2316075383, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEYGGTNHMFKHWZAARACJ6RTZTYONDAVCNFSM6AAAAABK7RRWK2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJWGA3TKMZYGM . You are receiving this because you were mentioned.Message ID: @.***>
No No, here Neo4j plays the main role, After importing data from MS GraphRAG .parquet files into the neo4j database we should be able to query the Neo4j database directly. like the below example: Here they mentioned both the example for local_search and global_search retriever. https://github.com/tomasonjo/blogs/blob/master/msft_graphrag/ms_graphrag_retriever.ipynb
You can look here for some inspiration for local and global search: https://towardsdatascience.com/integrating-microsoft-graphrag-into-neo4j-e0d4fa00714c V sre., 28. avg. 2024, 21:13 je oseba Cesar De la Torre < @.> napisala: … k2ai - Understand, but that's unrelated to Neo4j, right? It's done with GraphRAG outputs and feeding back the GPT model. You won't use anything from Neo4j for that, right? - My point is that it's a different path and usage as compared to Neo4j, am I wrong? — Reply to this email directly, view it on GitHub <#593 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEYGGTNHMFKHWZAARACJ6RTZTYONDAVCNFSM6AAAAABK7RRWK2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJWGA3TKMZYGM . You are receiving this because you were mentioned.Message ID: @.>
I tried this notebook and there are some errors in this.
Any more details about the errors?
V sre., 28. avg. 2024, 21:26 je oseba k2ai @.***> napisala:
You can look here for some inspiration for local and global search: https://towardsdatascience.com/integrating-microsoft-graphrag-into-neo4j-e0d4fa00714c V sre., 28. avg. 2024, 21:13 je oseba Cesar De la Torre < @.
> napisala: … <#m_-824872520348068549_> k2ai - Understand, but that's unrelated to Neo4j, right? It's done with GraphRAG outputs and feeding back the GPT model. You won't use anything from Neo4j for that, right? - My point is that it's a different path and usage as compared to Neo4j, am I wrong? — Reply to this email directly, view it on GitHub <#593 (comment) https://github.com/microsoft/graphrag/pull/593#issuecomment-2316075383>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEYGGTNHMFKHWZAARACJ6RTZTYONDAVCNFSM6AAAAABK7RRWK2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJWGA3TKMZYGM https://github.com/notifications/unsubscribe-auth/AEYGGTNHMFKHWZAARACJ6RTZTYONDAVCNFSM6AAAAABK7RRWK2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJWGA3TKMZYGM . You are receiving this because you were mentioned.Message ID: @.>
I tried this notebook and there are some errors in this.
— Reply to this email directly, view it on GitHub https://github.com/microsoft/graphrag/pull/593#issuecomment-2316095222, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEYGGTPSQKYRLYMXVIVYBT3ZTYP6LAVCNFSM6AAAAABK7RRWK2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJWGA4TKMRSGI . You are receiving this because you were mentioned.Message ID: @.***>
There was a conflict between Python libraries, but it has been resolved now. Now retrieval from Neo4j is working fine. the only issue is that the global query is taking tooooo much time.
I guess we could parallelize the intermediate step if needed. Depends obviously on the number of communities
V pon., 2. sep. 2024, 16:51 je oseba k2ai @.***> napisala:
There was a conflict between Python libraries, but it has been resolved now. Now retrieval from Neo4j is working fine. the only issue is that the global query is taking tooooo much time.
— Reply to this email directly, view it on GitHub https://github.com/microsoft/graphrag/pull/593#issuecomment-2324052139, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEYGGTPHAJR37VPFBG7EZU3ZUQKHRAVCNFSM6AAAAABK7RRWK2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRUGA2TEMJTHE . You are receiving this because you were mentioned.Message ID: @.***>