graphrag Added graphrag_import_neo4j

Added a notebook as discussed with @stevetru1 to load the result data of a GraphRAG indexing run into Neo4j.

Put it in the folder:

https://github.com/microsoft/graphrag/tree/main/examples_notebooks/neo4j

Jul 14 '24 00:07 jexp

@microsoft-github-policy-service agree company="Neo4j"

Jul 14 '24 01:07 jexp

@microsoft-github-policy-service agree company="Neo4j"

Jul 14 '24 01:07 jexp

@jexp

I am not familiar with neo4j, so I am trying to run this example which report an error about apoc. Shall we add some instruction about how to install apoc?

neo4j.exceptions.ClientError: {code: Neo.ClientError.Procedure.ProcedureNotFound} {message: There is no procedure with the name `apoc.create.addLabels` registered for this database instance. Please ensure you've spelled the procedure name correctly and that the procedure is properly deployed.}

updates:

I have tried myself, install a neo4j 4.4.0 with apoc, looks like there's a compatible issue with the commands in notebook, can you recommend a docker version or the commands without apoc?

Jul 16 '24 07:07 KylinMountain

Added a notebook as discussed with @stevetru1 to load the result data of a GraphRAG indexing run into Neo4j.

Put it in the folder:

https://github.com/microsoft/graphrag/tree/main/examples_notebooks/neo4j

sorry,I can't see this page

Jul 16 '24 10:07 BlingBunny

@KylinMountain you need neo4j 5.x for it and enable APOC, e.g. with docker

docker run -it --rm \
  --publish=7474:7474 --publish=7687:7687 \
  --env NEO4J_AUTH=none \
  --env NEO4J_PLUGINS='["apoc"]' \
  neo4j:5.21.0

Jul 16 '24 13:07 jexp

Hi @jexp

I created a PR to merge this one into main, #593 I had some changes (minor) I wanted to apply to your notebook, but due to access permissions I created a new PR in our repo.

Thank you so much for your contribution! I'll keep this PR open until the other one closes

Jul 17 '24 00:07 AlonsoGuevara

@KylinMountain you need neo4j 5.x for it and enable APOC, e.g. with docker
docker run -it --rm \
  --publish=7474:7474 --publish=7687:7687 \
  --env NEO4J_AUTH=none \
  --env NEO4J_PLUGINS='["apoc"]' \
  neo4j:5.21.0

@jexp Thank you for your detail answer, it is better now. But there's still something wrong when executing batch community_statement.

neo4j.exceptions.ConstraintError: {code: Neo.ClientError.Schema.ConstraintValidationFailed} {message: Node(211) already exists with label __Community__ and property community = '25'}

Shall we use match instead of MERGE on the value.community ? This is ok for me.

MATCH (c:__Community__ {community: value.community})
SET c += value {.level, .title, .rank, .rank_explanation, .full_content, .summary}
WITH c, value
UNWIND range(0, size(value.findings)-1) AS finding_idx
WITH c, value, finding_idx, value.findings[finding_idx] as finding
MERGE (c)-[:HAS_FINDING]->(f:Finding {id: finding_idx})
SET f += finding

Jul 17 '24 03:07 KylinMountain

@KylinMountain

The solution would be to merge on community instead of ID to not duplicate nodes:

# import communities
community_statement = """
MERGE (c:__Community__ {community:value.community})
SET c += value {.level, .title, .rank, .rank_explanation, .full_content, .summary}
WITH c, value
UNWIND range(0, size(value.findings)-1) AS finding_idx
WITH c, value, finding_idx, value.findings[finding_idx] as finding
MERGE (c)-[:HAS_FINDING]->(f:Finding {id:finding_idx})
SET f += finding
"""
batched_import(community_statement, community_report_df)

Also, since covariates are by default not included, it might be a good idea to add that to the notebook so that users don't get confused

Jul 28 '24 07:07 tomasonjo

Thanks for the awesome work. I noticed that you are not saving create_final_nodes.parquet, could you share the reason?

Aug 06 '24 09:08 yuleisheng

@yuleisheng what additional information is there that is missing?

Aug 06 '24 10:08 tomasonjo

@tomasonjo thanks for following up here. I was reading https://github.com/microsoft/graphrag/discussions/719, in theory, one entity can map to multiple nodes, 'and thus adopts new semantic meaning and analytic properties' I'm actually not sure how it will affect search TBH, would love to hear your thoughts.

Aug 06 '24 13:08 yuleisheng

any thoughts here? thanks

Aug 07 '24 15:08 yuleisheng

Your understanding is incorrect... its' only duplicate entities, no new semantic meaning

Aug 07 '24 18:08 tomasonjo

Added graphrag_import_neo4j_cypher Notebook