Difference between nodes and entities
Describe the issue
I'm having some difficulty to understand what is the difference between generated nodes and entities.
In believe that these two coincide, but for some datasets nodes are just entities (but for some reason repeated twice), whereas in others this x2 corellation does not hold.
Thanks
Steps to reproduce
No response
GraphRAG Config Used
No response
Logs and screenshots
No response
Additional Information
- GraphRAG Version:
- Operating System:
- Python Version:
- Related Issues:
In my understanding, if you print both create_final_nodes.parquet and create_final_entities.parquet, you will notice the difference. In create_final_nodes.parquet, there is information about the entities, whereas in create_final_entities.parquet, there is information about the entities and also includes the embeddings. They are used to create the entities variable in global and local search too.
from graphrag.query.indexer_adapters import (
read_indexer_covariates,
read_indexer_entities,
read_indexer_relationships,
read_indexer_reports,
read_indexer_text_units,
)
entities_df = pd.read_parquet(r"..\artifacts\create_final_nodes.parquet")
entities_df.head(1)
entity_embeddings_df = pd.read_parquet(r"..\artifacts\create_final_entities.parquet")
entity_embeddings_df.head(1)
entities = read_indexer_entities(entities_df, entity_embeddings_df,COMMUNITY_LEVEL)
Thank you for your reply.
Yes, obviously I inspected the two files and noticed that the embeddings are present only in the entities file.
My question is if there's any difference conceptually between nodes and entities. Also in the nodes file for many datasets the entities are repeated more than once (2 or 3 times).
Created Q&A here: https://github.com/microsoft/graphrag/discussions/719