graphrag icon indicating copy to clipboard operation
graphrag copied to clipboard

Added graphrag_import_neo4j_cypher Notebook

Open jexp opened this issue 1 year ago • 13 comments

Added a notebook as discussed with @stevetru1 to load the result data of a GraphRAG indexing run into Neo4j.

Put it in the folder:

https://github.com/microsoft/graphrag/tree/main/examples_notebooks/neo4j

jexp avatar Jul 14 '24 00:07 jexp

@microsoft-github-policy-service agree company="Neo4j"

jexp avatar Jul 14 '24 01:07 jexp

@microsoft-github-policy-service agree company="Neo4j"

jexp avatar Jul 14 '24 01:07 jexp

@jexp

I am not familiar with neo4j, so I am trying to run this example which report an error about apoc. Shall we add some instruction about how to install apoc?

neo4j.exceptions.ClientError: {code: Neo.ClientError.Procedure.ProcedureNotFound} {message: There is no procedure with the name `apoc.create.addLabels` registered for this database instance. Please ensure you've spelled the procedure name correctly and that the procedure is properly deployed.}

updates:

I have tried myself, install a neo4j 4.4.0 with apoc, looks like there's a compatible issue with the commands in notebook, can you recommend a docker version or the commands without apoc?

KylinMountain avatar Jul 16 '24 07:07 KylinMountain

Added a notebook as discussed with @stevetru1 to load the result data of a GraphRAG indexing run into Neo4j.

Put it in the folder:

https://github.com/microsoft/graphrag/tree/main/examples_notebooks/neo4j

sorry,I can't see this page

BlingBunny avatar Jul 16 '24 10:07 BlingBunny

@KylinMountain you need neo4j 5.x for it and enable APOC, e.g. with docker

docker run -it --rm \
  --publish=7474:7474 --publish=7687:7687 \
  --env NEO4J_AUTH=none \
  --env NEO4J_PLUGINS='["apoc"]' \
  neo4j:5.21.0

jexp avatar Jul 16 '24 13:07 jexp

Hi @jexp

I created a PR to merge this one into main, #593 I had some changes (minor) I wanted to apply to your notebook, but due to access permissions I created a new PR in our repo.

Thank you so much for your contribution! I'll keep this PR open until the other one closes

AlonsoGuevara avatar Jul 17 '24 00:07 AlonsoGuevara

@KylinMountain you need neo4j 5.x for it and enable APOC, e.g. with docker

docker run -it --rm \
  --publish=7474:7474 --publish=7687:7687 \
  --env NEO4J_AUTH=none \
  --env NEO4J_PLUGINS='["apoc"]' \
  neo4j:5.21.0

@jexp Thank you for your detail answer, it is better now. But there's still something wrong when executing batch community_statement.

neo4j.exceptions.ConstraintError: {code: Neo.ClientError.Schema.ConstraintValidationFailed} {message: Node(211) already exists with label __Community__ and property community = '25'}

Shall we use match instead of MERGE on the value.community ? This is ok for me.

MATCH (c:__Community__ {community: value.community})
SET c += value {.level, .title, .rank, .rank_explanation, .full_content, .summary}
WITH c, value
UNWIND range(0, size(value.findings)-1) AS finding_idx
WITH c, value, finding_idx, value.findings[finding_idx] as finding
MERGE (c)-[:HAS_FINDING]->(f:Finding {id: finding_idx})
SET f += finding

KylinMountain avatar Jul 17 '24 03:07 KylinMountain

@KylinMountain

The solution would be to merge on community instead of ID to not duplicate nodes:

# import communities
community_statement = """
MERGE (c:__Community__ {community:value.community})
SET c += value {.level, .title, .rank, .rank_explanation, .full_content, .summary}
WITH c, value
UNWIND range(0, size(value.findings)-1) AS finding_idx
WITH c, value, finding_idx, value.findings[finding_idx] as finding
MERGE (c)-[:HAS_FINDING]->(f:Finding {id:finding_idx})
SET f += finding
"""
batched_import(community_statement, community_report_df)

Also, since covariates are by default not included, it might be a good idea to add that to the notebook so that users don't get confused

tomasonjo avatar Jul 28 '24 07:07 tomasonjo

Thanks for the awesome work. I noticed that you are not saving create_final_nodes.parquet, could you share the reason?

yuleisheng avatar Aug 06 '24 09:08 yuleisheng

@yuleisheng what additional information is there that is missing?

tomasonjo avatar Aug 06 '24 10:08 tomasonjo

@tomasonjo thanks for following up here. I was reading https://github.com/microsoft/graphrag/discussions/719, in theory, one entity can map to multiple nodes, 'and thus adopts new semantic meaning and analytic properties' I'm actually not sure how it will affect search TBH, would love to hear your thoughts.

yuleisheng avatar Aug 06 '24 13:08 yuleisheng

any thoughts here? thanks

yuleisheng avatar Aug 07 '24 15:08 yuleisheng

Your understanding is incorrect... its' only duplicate entities, no new semantic meaning

tomasonjo avatar Aug 07 '24 18:08 tomasonjo