graphrag icon indicating copy to clipboard operation
graphrag copied to clipboard

[Issue]: <title> I resolve issues related to 'create_final_community_reports: community'.

Open peixikk opened this issue 1 year ago • 4 comments

Is there an existing issue for this?

  • [ ] I have searched the existing issues
  • [ ] I have checked #657 to validate if my issue is covered by community support

Describe the issue

Certainly! Here’s the translation of the provided text into English:


The code in graphrag-local-ollama/graphrag/index/graph/extractors/community_reports/community_reports_extractor.py is problematic. This code is intended to generate a summary report. The response is obtained by calling an asynchronous method _llm with the following parameters:

response = (
    await self._llm(
        self._extraction_prompt,
        json=True,
        name="create_community_report",
        variables={self._input_text_key: inputs[self._input_text_key]},
        is_response_valid=lambda x: dict_has_keys_with_types(
            x,
            [
                ("title", str),
                ("summary", str),
                ("findings", list),
                ("rating", float),
                ("rating_explanation", str),
            ],
        ),
        model_parameters={"max_tokens": self._max_report_length},
    )
)

This returns JSON data, and text_output is extracted using _get_text_output(output). The purpose here is to return CommunityReportsResult with structured_output as the JSON data and output as the string.

Therefore, it can be simplified to:

# Copyright (c) 2024 Microsoft Corporation.
# Licensed under the MIT License

"""A module containing 'CommunityReportsResult' and 'CommunityReportsExtractor' models."""
import json
import logging
import traceback
from dataclasses import dataclass
from typing import Any

from graphrag.index.typing import ErrorHandlerFn
from graphrag.index.utils import dict_has_keys_with_types
from graphrag.llm import CompletionLLM

from .prompts import COMMUNITY_REPORT_PROMPT

log = logging.getLogger(__name__)


@dataclass
class CommunityReportsResult:
    """Community reports result class definition."""

    output: str
    structured_output: dict


class CommunityReportsExtractor:
    """Community reports extractor class definition."""

    _llm: CompletionLLM
    _input_text_key: str
    _extraction_prompt: str
    _output_formatter_prompt: str
    _on_error: ErrorHandlerFn
    _max_report_length: int

    def __init__(
        self,
        llm_invoker: CompletionLLM,
        input_text_key: str | None = None,
        extraction_prompt: str | None = None,
        on_error: ErrorHandlerFn | None = None,
        max_report_length: int | None = None,
    ):
        """Init method definition."""
        self._llm = llm_invoker
        self._input_text_key = input_text_key or "input_text"
        self._extraction_prompt = extraction_prompt or COMMUNITY_REPORT_PROMPT
        self._on_error = on_error or (lambda _e, _s, _d: None)
        self._max_report_length = max_report_length or 1500

    async def __call__(self, inputs: dict[str, Any]):
        """Call method definition."""
        file_path = "/kaggle/working/typestructured.txt"

        # Open file and read JSON data
        with open(file_path, "r") as f:
            output = json.load(f)
        
        return CommunityReportsResult(
            structured_output=output,
            output="""# Village of Hana Ri

The Village of Hana Ri, named after its most notable resident, is a close-knit community centered around family and tradition. The key entities within this village include Hana Ri himself, his family members such as his uncle '三叔' and younger sister, various villagers, artisans, and even a blacksmith. The relationships between these entities are deeply intertwined, with significant events like the departure of Hana Ri's parents affecting him profoundly.

## Hana Ri, the primary character, lives in the village and is one of the children rarely heard by his real name.

Hana Ri, also known as '二愣子' by '老张叔', is a resident of the Village of Hana Ri. He is one of the children in the village who seldom hears his real name called [Data: Characters (1)]. This suggests that Hana Ri may have a unique or unconventional role within the community.

## Hana Ri's relationship with his family is significant, particularly with his younger sister.

Hana Ri has a close bond with his younger sister. He intends to earn money to return home and thinks about picking more red jujubes for her when he goes to the mountain [Data: Characters (1), Relationships (+more)]. This indicates that Hana Ri values his family and is willing to make sacrifices for them.

## Hana Ri admires his uncle, the blacksmith, and aspires to become an apprentice under him.

Hana Ri greatly admires his uncle, the blacksmith. He wants to become an apprentice for the artisan master [Data: Characters (1), Relationships (+more)]. This shows that Hana Ri respects and looks up to skilled individuals within the community.

## The departure of Hana Ri's parents has a profound impact on him.

Hana Ri is affected by his parents leaving him [Data: Characters (1), Events (+more)]. This event likely shapes Hana Ri's perspective and motivations within the community.

## The Village of Hana Ri has a rich tradition of jujube cultivation.

Hana Ri thinks about picking more red jujubes for his sister when he goes to the mountain [Data: Characters (1), Items (+more)]. This suggests that jujubes are an important part of the village's culture and economy.""",
        )

    def _get_text_output(self, parsed_output: dict) -> str:
        title = parsed_output.get("title", "Report")
        summary = parsed_output.get("summary", "")
        findings = parsed_output.get("findings", [])

        def finding_summary(finding: dict):
            if isinstance(finding, str):
                return finding
            return finding.get("summary")

        def finding_explanation(finding: dict):
            if isinstance(finding, str):
                return ""
            return finding.get("explanation")

        report_sections = "\n\n".join(
            f"## {finding_summary(f)}\n\n{finding_explanation(f)}" for f in findings
        )
        return f"# {title}\n\n{summary}\n\n{report_sections}"

Below is the link to my Kaggle code, which is based on the graphrag-local-ollama code. I have made some modifications:: https://www.kaggle.com/code/xipeig/graphrag-ollama

是这个graphrag-local-ollama/graphrag/index/graph/extractors/community_reports/community_reports_extractor.py代码出了问题,这个代码的功能是概述报告,response = ( await self._llm( self._extraction_prompt, json=True, name="create_community_report", variables={self._input_text_key: inputs[self._input_text_key]}, is_response_valid=lambda x: dict_has_keys_with_types( x, [ ("title", str), ("summary", str), ("findings", list), ("rating", float), ("rating_explanation", str), ], ), model_parameters={"max_tokens": self._max_report_length}, )

这个返回一个json数据,text_output = self._get_text_output(output)这个提取出summary,title等等.作用在这里return CommunityReportsResult( structured_output=output,#json数据 output=text_output,#str )

所以可以直接写成

Copyright (c) 2024 Microsoft Corporation.

Licensed under the MIT License

"""A module containing 'CommunityReportsResult' and 'CommunityReportsExtractor' models.""" import json import logging import traceback from dataclasses import dataclass from typing import Any

from graphrag.index.typing import ErrorHandlerFn from graphrag.index.utils import dict_has_keys_with_types from graphrag.llm import CompletionLLM

from .prompts import COMMUNITY_REPORT_PROMPT

log = logging.getLogger(name)

@dataclass class CommunityReportsResult: """Community reports result class definition."""

output: str
structured_output: dict

class CommunityReportsExtractor: """Community reports extractor class definition."""

_llm: CompletionLLM
_input_text_key: str
_extraction_prompt: str
_output_formatter_prompt: str
_on_error: ErrorHandlerFn
_max_report_length: int

def __init__(
    self,
    llm_invoker: CompletionLLM,
    input_text_key: str | None = None,
    extraction_prompt: str | None = None,
    on_error: ErrorHandlerFn | None = None,
    max_report_length: int | None = None,
):
    """Init method definition."""
    self._llm = llm_invoker
    self._input_text_key = input_text_key or "input_text"
    self._extraction_prompt = extraction_prompt or COMMUNITY_REPORT_PROMPT
    self._on_error = on_error or (lambda _e, _s, _d: None)
    self._max_report_length = max_report_length or 1500

async def __call__(self, inputs: dict[str, Any]):
    """Call method definition."""
    # output = None
    # try:
    #     response = (
    #         await self._llm(
    #             self._extraction_prompt,
    #             json=True,
    #             name="create_community_report",
    #             variables={self._input_text_key: inputs[self._input_text_key]},
    #             is_response_valid=lambda x: dict_has_keys_with_types(
    #                 x,
    #                 [
    #                     ("title", str),
    #                     ("summary", str),
    #                     ("findings", list),
    #                     ("rating", float),
    #                     ("rating_explanation", str),
    #                 ],
    #             ),
    #             model_parameters={"max_tokens": self._max_report_length},
    #         )
    #         or {}
    #     )
    #     output = response.json or {}
    # except Exception as e:
    #     log.exception("error generating community report")
    #     self._on_error(e, traceback.format_exc(), None)
    #     output = {}

    # try:
    #     with open("/kaggle/working/typestructured_output.txt", "w") as f:
    #         json.dump(output, f, indent=4)
    # except Exception as e:
    #     with open("/kaggle/working/output.txt", "w") as f:
    #         f.write(str(e))
    # text_output = self._get_text_output(output)
    # try:
    #     with open("/kaggle/working/typeoutput.txt", "w") as f:
    #         f.write(text_output)
    # except Exception as e:
    #     with open("/kaggle/working/output1.txt", "w") as f:
    #         f.write(str(e))

    file_path = "/kaggle/working/typestructured.txt"

    # 打开文件并读取 JSON 数据
    with open(file_path, "r") as f:
        output = json.load(f)
    return CommunityReportsResult(
        structured_output=output,
        output="""# Village of Hana Ri

The Village of Hana Ri, named after its most notable resident, is a close-knit community centered around family and tradition. The key entities within this village include Hana Ri himself, his family members such as his uncle '三叔' and younger sister, various villagers, artisans, and even a blacksmith. The relationships between these entities are deeply intertwined, with significant events like the departure of Hana Ri's parents affecting him profoundly.

Hana Ri, the primary character, lives in the village and is one of the children rarely heard by his real name.

Hana Ri, also known as '二愣子' by '老张叔', is a resident of the Village of Hana Ri. He is one of the children in the village who seldom hears his real name called [Data: Characters (1)]. This suggests that Hana Ri may have a unique or unconventional role within the community.

Hana Ri's relationship with his family is significant, particularly with his younger sister.

Hana Ri has a close bond with his younger sister. He intends to earn money to return home and thinks about picking more red jujubes for her when he goes to the mountain [Data: Characters (1), Relationships (+more)]. This indicates that Hana Ri values his family and is willing to make sacrifices for them.

Hana Ri admires his uncle, the blacksmith, and aspires to become an apprentice under him.

Hana Ri greatly admires his uncle, the blacksmith. He wants to become an apprentice for the artisan master [Data: Characters (1), Relationships (+more)]. This shows that Hana Ri respects and looks up to skilled individuals within the community.

The departure of Hana Ri's parents has a profound impact on him.

Hana Ri is affected by his parents leaving him [Data: Characters (1), Events (+more)]. This event likely shapes Hana Ri's perspective and motivations within the community.

The Village of Hana Ri has a rich tradition of jujube cultivation.

Hana Ri thinks about picking more red jujubes for his sister when he goes to the mountain [Data: Characters (1), Items (+more)]. This suggests that jujubes are an important part of the village's culture and economy.""", )

def _get_text_output(self, parsed_output: dict) -> str:
    title = parsed_output.get("title", "Report")
    summary = parsed_output.get("summary", "")
    findings = parsed_output.get("findings", [])

    def finding_summary(finding: dict):
        if isinstance(finding, str):
            return finding
        return finding.get("summary")

    def finding_explanation(finding: dict):
        if isinstance(finding, str):
            return ""
        return finding.get("explanation")

    report_sections = "\n\n".join(
        f"## {finding_summary(f)}\n\n{finding_explanation(f)}" for f in findings
    )
    return f"# {title}\n\n{summary}\n\n{report_sections}"

下面是我的kaggle代码链接,是graphrag-local-ollama代码,我做了一点修改: https://www.kaggle.com/code/xipeig/graphrag-ollama

Steps to reproduce

No response

GraphRAG Config Used

# Paste your config here

Logs and screenshots

No response

Additional Information

  • GraphRAG Version:
  • Operating System:
  • Python Version:
  • Related Issues:

peixikk avatar Jul 25 '24 10:07 peixikk

能不能直接贴中文,到底改哪里,让我抄抄作业行不,看源码太累了

xxll88 avatar Jul 31 '24 03:07 xxll88

can you open a pr for this?

tkizm1 avatar Jul 31 '24 06:07 tkizm1

can you open a pr for this?

直接重新启动ollama serve就工作了

peixikk avatar Aug 02 '24 12:08 peixikk

can you open a pr for this?

You can directly restart the Ollama serve and it will work.

peixikk avatar Aug 02 '24 12:08 peixikk

We have resolved several issues related to text encoding and JSON parsing that are rolled up into version 0.2.2. We believe this issue is resolved as part of this release.

natoverse avatar Aug 09 '24 17:08 natoverse

We have resolved several issues related to text encoding and JSON parsing that are rolled up into version 0.2.2. We believe this issue is resolved as part of this release. It works. Thank you.

peixikk avatar Aug 11 '24 04:08 peixikk