graphrag
graphrag copied to clipboard
How to achieve hierarchical summary generation for community reports?
Do you need to file an issue?
- [x] I have searched the existing issues and this bug is not already filed.
- [ ] My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
- [x] I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.
Describe the bug
How to achieve hierarchical summary generation for community reports? When I was reading the code, I found that the summary of the community report was not associated with sub - communities. It presented as an independent community summary. As shown in the following code implementation, the parameter reports of level_context_builder is always empty, making the parameter community_hierarchy useless. Please tell me how to understand and implement bottom - up hierarchical abstraction to generate community summaries.
async def summarize_communities(
nodes: pd.DataFrame,
communities: pd.DataFrame,
local_contexts,
level_context_builder: Callable,
callbacks: WorkflowCallbacks,
cache: PipelineCache,
strategy: dict,
max_input_length: int,
async_mode: AsyncType = AsyncType.AsyncIO,
num_threads: int = 4,
):
"""Generate community summaries."""
reports: list[CommunityReport | None] = []
tick = progress_ticker(callbacks.progress, len(local_contexts))
strategy_exec = load_strategy(strategy["type"])
strategy_config = {**strategy}
# if max_retries is not set, inject a dynamically assigned value based on the total number of expected LLM calls to be made
if strategy_config.get("llm") and strategy_config["llm"]["max_retries"] == -1:
strategy_config["llm"]["max_retries"] = len(nodes)
community_hierarchy = (
communities.explode("children")
.rename({"children": "sub_community"}, axis=1)
.loc[:, ["community", "level", "sub_community"]]
).dropna()
levels = get_levels(nodes)
level_contexts = []
for level in levels:
level_context = level_context_builder(
pd.DataFrame(reports),
community_hierarchy_df=community_hierarchy,
local_context_df=local_contexts,
level=level,
max_context_tokens=max_input_length,
)
level_contexts.append(level_context)
for level_context in level_contexts:
async def run_generate(record):
result = await _generate_report(
strategy_exec,
community_id=record[schemas.COMMUNITY_ID],
community_level=record[schemas.COMMUNITY_LEVEL],
community_context=record[schemas.CONTEXT_STRING],
callbacks=callbacks,
cache=cache,
strategy=strategy_config,
)
tick()
return result
local_reports = await derive_from_rows(
level_context,
run_generate,
callbacks=NoopWorkflowCallbacks(),
num_threads=num_threads,
async_type=async_mode,
)
reports.extend([lr for lr in local_reports if lr is not None])
return pd.DataFrame(reports)
Steps to reproduce
No response
Expected Behavior
No response
GraphRAG Config Used
# Paste your config here
Logs and screenshots
No response
Additional Information
- GraphRAG Version:
- Operating System:
- Python Version:
- Related Issues: