graphrag icon indicating copy to clipboard operation
graphrag copied to clipboard

Add graphrag_import_neo4j_cypher Notebook

Open AlonsoGuevara opened this issue 1 year ago • 3 comments

Description

Merge PR created for #544 Kudos to @jexp

Related Issues

#418 #433

Checklist

  • [ ] I have tested these changes locally.
  • [ ] I have reviewed the code changes.
  • [ ] I have updated the documentation (if necessary).
  • [ ] I have added appropriate unit tests (if applicable).

AlonsoGuevara avatar Jul 17 '24 00:07 AlonsoGuevara

Thanks @AlonsoGuevara

jexp avatar Jul 17 '24 07:07 jexp

Hey @AlonsoGuevara. I was testing the graphrag_import_neo4j_cypher.ipynb notebook (curently in the branch community/graphrag_import_neo4j_cypher) importing my own parquet files generated with GraphRAG into my NEO4J instance. I could run most of the initial cells and it looks it's importing properly, but then, when loading the "COMMUNITY REPORT" data, it failed like this:

image

The error mesage is: "ConstraintError: {code: Neo.ClientError.Schema.ConstraintValidationFailed} {message: Node(55) already exists with label __Community__ and property community = '0'}"

Not sure if that ID should be "Community_Report" instead of "Community" because that was already imported in the previous cell? Not sure, it might be a different reason... in any case, the notebook is failing in that cell.

Btw, I'm able to load all my test PARQUET files into "graphrag-visualizer" which worked right away: https://noworneverev.github.io/graphrag-visualizer/

However, I want to visualize with NEO4J because it looks more flexible and customizable. I want to be able to filter by certain flags/types marked on the entities, etc.

Do you know how to fix this issue/error? Or is it an issue with the notebook?

I'm attaching my parquet file (as mentioned they work on "graphrag-visualizer") so anyone can repro with this notebook for NEO4J. test-output-parquet-files.zip

CESARDELATORRE avatar Aug 19 '24 16:08 CESARDELATORRE

Hmm, I guess we have to decide what is/are the primary keys for Community Reports, currently we have both community and id. And it seems that there is a conflict here that the same id is using different community.

Would be good to clarify which of the two is actually identifying a community uniquely.

jexp avatar Aug 19 '24 17:08 jexp

Adding @tomasonjo fix from #999

Thanks for your contribution!

AlonsoGuevara avatar Aug 23 '24 18:08 AlonsoGuevara

can you please provide retrieval notebooks as well, considering global and local search?

k2ai avatar Aug 28 '24 14:08 k2ai

@k2ai Once it's in the NEo4j database, I'm using Neo4j desktop or in the cloud.

What kind of "retrieval notebooks" are you thinking that it'd be useful?

This is how I use it in Neo4J desktop, note the query in there, etc. so it's dynamic:

image

CESARDELATORRE avatar Aug 28 '24 17:08 CESARDELATORRE

Notebooks like global_search and local_search are in example folders where we can directly ask questions using normal text instead of cypher query. Like this: query

k2ai avatar Aug 28 '24 19:08 k2ai

k2ai - Understand, but that's unrelated to Neo4j, right? It's done with GraphRAG outputs and feeding back the GPT model. You won't use anything from Neo4j for that, right? - My point is that it's a different path and usage as compared to Neo4j, am I wrong?

CESARDELATORRE avatar Aug 28 '24 19:08 CESARDELATORRE

You can look here for some inspiration for local and global search: https://towardsdatascience.com/integrating-microsoft-graphrag-into-neo4j-e0d4fa00714c

V sre., 28. avg. 2024, 21:13 je oseba Cesar De la Torre < @.***> napisala:

k2ai - Understand, but that's unrelated to Neo4j, right? It's done with GraphRAG outputs and feeding back the GPT model. You won't use anything from Neo4j for that, right? - My point is that it's a different path and usage as compared to Neo4j, am I wrong?

— Reply to this email directly, view it on GitHub https://github.com/microsoft/graphrag/pull/593#issuecomment-2316075383, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEYGGTNHMFKHWZAARACJ6RTZTYONDAVCNFSM6AAAAABK7RRWK2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJWGA3TKMZYGM . You are receiving this because you were mentioned.Message ID: @.***>

tomasonjo avatar Aug 28 '24 19:08 tomasonjo

No No, here Neo4j plays the main role, After importing data from MS GraphRAG .parquet files into the neo4j database we should be able to query the Neo4j database directly. like the below example: Here they mentioned both the example for local_search and global_search retriever. https://github.com/tomasonjo/blogs/blob/master/msft_graphrag/ms_graphrag_retriever.ipynb

k2ai avatar Aug 28 '24 19:08 k2ai

You can look here for some inspiration for local and global search: https://towardsdatascience.com/integrating-microsoft-graphrag-into-neo4j-e0d4fa00714c V sre., 28. avg. 2024, 21:13 je oseba Cesar De la Torre < @.> napisala: k2ai - Understand, but that's unrelated to Neo4j, right? It's done with GraphRAG outputs and feeding back the GPT model. You won't use anything from Neo4j for that, right? - My point is that it's a different path and usage as compared to Neo4j, am I wrong? — Reply to this email directly, view it on GitHub <#593 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEYGGTNHMFKHWZAARACJ6RTZTYONDAVCNFSM6AAAAABK7RRWK2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJWGA3TKMZYGM . You are receiving this because you were mentioned.Message ID: @.>

I tried this notebook and there are some errors in this.

k2ai avatar Aug 28 '24 19:08 k2ai

Any more details about the errors?

V sre., 28. avg. 2024, 21:26 je oseba k2ai @.***> napisala:

You can look here for some inspiration for local and global search: https://towardsdatascience.com/integrating-microsoft-graphrag-into-neo4j-e0d4fa00714c V sre., 28. avg. 2024, 21:13 je oseba Cesar De la Torre < @.

> napisala: … <#m_-824872520348068549_> k2ai - Understand, but that's unrelated to Neo4j, right? It's done with GraphRAG outputs and feeding back the GPT model. You won't use anything from Neo4j for that, right? - My point is that it's a different path and usage as compared to Neo4j, am I wrong? — Reply to this email directly, view it on GitHub <#593 (comment) https://github.com/microsoft/graphrag/pull/593#issuecomment-2316075383>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEYGGTNHMFKHWZAARACJ6RTZTYONDAVCNFSM6AAAAABK7RRWK2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJWGA3TKMZYGM https://github.com/notifications/unsubscribe-auth/AEYGGTNHMFKHWZAARACJ6RTZTYONDAVCNFSM6AAAAABK7RRWK2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJWGA3TKMZYGM . You are receiving this because you were mentioned.Message ID: @.>

I tried this notebook and there are some errors in this.

— Reply to this email directly, view it on GitHub https://github.com/microsoft/graphrag/pull/593#issuecomment-2316095222, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEYGGTPSQKYRLYMXVIVYBT3ZTYP6LAVCNFSM6AAAAABK7RRWK2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJWGA4TKMRSGI . You are receiving this because you were mentioned.Message ID: @.***>

tomasonjo avatar Aug 28 '24 19:08 tomasonjo

There was a conflict between Python libraries, but it has been resolved now. Now retrieval from Neo4j is working fine. the only issue is that the global query is taking tooooo much time.

k2ai avatar Sep 02 '24 07:09 k2ai

I guess we could parallelize the intermediate step if needed. Depends obviously on the number of communities

V pon., 2. sep. 2024, 16:51 je oseba k2ai @.***> napisala:

There was a conflict between Python libraries, but it has been resolved now. Now retrieval from Neo4j is working fine. the only issue is that the global query is taking tooooo much time.

— Reply to this email directly, view it on GitHub https://github.com/microsoft/graphrag/pull/593#issuecomment-2324052139, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEYGGTPHAJR37VPFBG7EZU3ZUQKHRAVCNFSM6AAAAABK7RRWK2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRUGA2TEMJTHE . You are receiving this because you were mentioned.Message ID: @.***>

tomasonjo avatar Sep 02 '24 09:09 tomasonjo