Added graphrag_import_neo4j_cypher Notebook
Added a notebook as discussed with @stevetru1 to load the result data of a GraphRAG indexing run into Neo4j.
Put it in the folder:
https://github.com/microsoft/graphrag/tree/main/examples_notebooks/neo4j
@microsoft-github-policy-service agree company="Neo4j"
@microsoft-github-policy-service agree company="Neo4j"
@jexp
I am not familiar with neo4j, so I am trying to run this example which report an error about apoc. Shall we add some instruction about how to install apoc?
neo4j.exceptions.ClientError: {code: Neo.ClientError.Procedure.ProcedureNotFound} {message: There is no procedure with the name `apoc.create.addLabels` registered for this database instance. Please ensure you've spelled the procedure name correctly and that the procedure is properly deployed.}
updates:
I have tried myself, install a neo4j 4.4.0 with apoc, looks like there's a compatible issue with the commands in notebook, can you recommend a docker version or the commands without apoc?
Added a notebook as discussed with @stevetru1 to load the result data of a GraphRAG indexing run into Neo4j.
Put it in the folder:
https://github.com/microsoft/graphrag/tree/main/examples_notebooks/neo4j
sorry,I can't see this page
@KylinMountain you need neo4j 5.x for it and enable APOC, e.g. with docker
docker run -it --rm \
--publish=7474:7474 --publish=7687:7687 \
--env NEO4J_AUTH=none \
--env NEO4J_PLUGINS='["apoc"]' \
neo4j:5.21.0
Hi @jexp
I created a PR to merge this one into main, #593 I had some changes (minor) I wanted to apply to your notebook, but due to access permissions I created a new PR in our repo.
Thank you so much for your contribution! I'll keep this PR open until the other one closes
@KylinMountain you need neo4j 5.x for it and enable APOC, e.g. with docker
docker run -it --rm \ --publish=7474:7474 --publish=7687:7687 \ --env NEO4J_AUTH=none \ --env NEO4J_PLUGINS='["apoc"]' \ neo4j:5.21.0
@jexp
Thank you for your detail answer, it is better now. But there's still something wrong when executing batch community_statement.
neo4j.exceptions.ConstraintError: {code: Neo.ClientError.Schema.ConstraintValidationFailed} {message: Node(211) already exists with label __Community__ and property community = '25'}
Shall we use match instead of MERGE on the value.community ? This is ok for me.
MATCH (c:__Community__ {community: value.community})
SET c += value {.level, .title, .rank, .rank_explanation, .full_content, .summary}
WITH c, value
UNWIND range(0, size(value.findings)-1) AS finding_idx
WITH c, value, finding_idx, value.findings[finding_idx] as finding
MERGE (c)-[:HAS_FINDING]->(f:Finding {id: finding_idx})
SET f += finding
@KylinMountain
The solution would be to merge on community instead of ID to not duplicate nodes:
# import communities
community_statement = """
MERGE (c:__Community__ {community:value.community})
SET c += value {.level, .title, .rank, .rank_explanation, .full_content, .summary}
WITH c, value
UNWIND range(0, size(value.findings)-1) AS finding_idx
WITH c, value, finding_idx, value.findings[finding_idx] as finding
MERGE (c)-[:HAS_FINDING]->(f:Finding {id:finding_idx})
SET f += finding
"""
batched_import(community_statement, community_report_df)
Also, since covariates are by default not included, it might be a good idea to add that to the notebook so that users don't get confused
Thanks for the awesome work. I noticed that you are not saving create_final_nodes.parquet, could you share the reason?
@yuleisheng what additional information is there that is missing?
@tomasonjo thanks for following up here. I was reading https://github.com/microsoft/graphrag/discussions/719, in theory, one entity can map to multiple nodes, 'and thus adopts new semantic meaning and analytic properties' I'm actually not sure how it will affect search TBH, would love to hear your thoughts.
any thoughts here? thanks
Your understanding is incorrect... its' only duplicate entities, no new semantic meaning