graphrag icon indicating copy to clipboard operation
graphrag copied to clipboard

Difference between nodes and entities

Open namp opened this issue 1 year ago • 2 comments

Describe the issue

I'm having some difficulty to understand what is the difference between generated nodes and entities.

In believe that these two coincide, but for some datasets nodes are just entities (but for some reason repeated twice), whereas in others this x2 corellation does not hold.

Thanks

Steps to reproduce

No response

GraphRAG Config Used

No response

Logs and screenshots

No response

Additional Information

  • GraphRAG Version:
  • Operating System:
  • Python Version:
  • Related Issues:

namp avatar Jul 15 '24 10:07 namp

In my understanding, if you print both create_final_nodes.parquet and create_final_entities.parquet, you will notice the difference. In create_final_nodes.parquet, there is information about the entities, whereas in create_final_entities.parquet, there is information about the entities and also includes the embeddings. They are used to create the entities variable in global and local search too.

from graphrag.query.indexer_adapters import (
    read_indexer_covariates,
    read_indexer_entities,
    read_indexer_relationships,
    read_indexer_reports,
    read_indexer_text_units,
)

entities_df = pd.read_parquet(r"..\artifacts\create_final_nodes.parquet")
entities_df.head(1)

entity_embeddings_df = pd.read_parquet(r"..\artifacts\create_final_entities.parquet")
entity_embeddings_df.head(1)

entities = read_indexer_entities(entities_df, entity_embeddings_df,COMMUNITY_LEVEL)

kouskouss avatar Jul 16 '24 11:07 kouskouss

Thank you for your reply.

Yes, obviously I inspected the two files and noticed that the embeddings are present only in the entities file.

My question is if there's any difference conceptually between nodes and entities. Also in the nodes file for many datasets the entities are repeated more than once (2 or 3 times).

namp avatar Jul 16 '24 11:07 namp

Created Q&A here: https://github.com/microsoft/graphrag/discussions/719

natoverse avatar Jul 25 '24 20:07 natoverse