graphrag
graphrag copied to clipboard
[Issue]: Why do we keep the highest community level?
Describe the issue
I think we should keep the community id list, not the level Since the community id is required later in calculating the report weight, the level does not seem to make sense
entity_df = cast(pd.DataFrame, entity_df[["title", "degree", "community", "level"]]).rename(
columns={"title": "name", "degree": "rank"}
)
entity_df["community"] = entity_df["community"].fillna(-1)
entity_df["community"] = entity_df["community"].astype(int)
entity_df["rank"] = entity_df["rank"].astype(int)
# for duplicate entities, keep the one with the highest community level
entity_df["community"] = entity_df["community"].apply(lambda x: str(x))
entity_df = (
entity_df.groupby(["name", "rank"])
.agg({"community": lambda x: list(x)})
.reset_index()
)
Steps to reproduce
No response
GraphRAG Config Used
No response
Logs and screenshots
No response
Additional Information
- GraphRAG Version:
- Operating System:
- Python Version:
- Related Issues: