graphrag icon indicating copy to clipboard operation
graphrag copied to clipboard

[Issue]: Why do we keep the highest community level?

Open kShines opened this issue 1 year ago • 0 comments

Describe the issue

I think we should keep the community id list, not the level Since the community id is required later in calculating the report weight, the level does not seem to make sense

    entity_df = cast(pd.DataFrame, entity_df[["title", "degree", "community", "level"]]).rename(
        columns={"title": "name", "degree": "rank"}
    )

    entity_df["community"] = entity_df["community"].fillna(-1)
    entity_df["community"] = entity_df["community"].astype(int)
    entity_df["rank"] = entity_df["rank"].astype(int)

    # for duplicate entities, keep the one with the highest community level
    entity_df["community"] = entity_df["community"].apply(lambda x: str(x))
    entity_df = (
        entity_df.groupby(["name", "rank"])
        .agg({"community": lambda x: list(x)})
        .reset_index()
    )

image

image

Steps to reproduce

No response

GraphRAG Config Used

No response

Logs and screenshots

No response

Additional Information

  • GraphRAG Version:
  • Operating System:
  • Python Version:
  • Related Issues:

kShines avatar Jul 17 '24 09:07 kShines