graphrag [Serious bug] text files that do not support Chinese content

I attempted to conduct an RAG test using Qian Zhongshu's "Fortress Besieged" and encountered the following errors.

the pipeline msg：

❌ create_final_community_reports
None
⠋ GraphRAG Indexer 
├── Loading Input (text) - 1 files loaded (0 filtered) ----- 100% 0:00:… 0:00:…
├── create_base_text_units
├── create_base_extracted_entities
├── create_summarized_entities
├── create_base_entity_graph
├── create_final_entities
├── create_final_nodes
├── create_final_communities
├── join_text_units_to_entity_ids
├── create_final_relationships
├── join_text_units_to_relationship_ids
└── create_final_community_reports❌ Errors occurred during the pipeline run, see logs for more details.

View logs like below：

10:54:04,391 graphrag.llm.openai.utils ERROR error loading json, json=```json
{
    "title": "�����̼�������������",
    "summary": "�������Է����̼���Ϊ���ģ��漰���м��������ʵ�塣��������Ϊ���峤�����Լ�ͥ���������Ӱ�죬������������ҽҩ���档������Ϊ�������ģ�������Ա�����еĹ����Ͳ�����ϵ�������ڵĹ�ϵ���ӣ��漰�����ڲ��Ľ��������þ����Լ����ⲿʵ��Ļ�����",
    "rating": 6.5,
    "rating_explanation": "��������Ӱ������Ϊ�е�ƫ�ϣ���Ҫ��Ϊ�����ڲ��Ľ����;��þ��߿��ܶ����������������Ӧ��",
    "findings": [
        {
            "summary": "�������ڼ����еĺ��ĵ�λ",
            "explanation": "��������Ϊ�����еĳ������Լ�ͥ��������Զ��Ӱ�졣�����������ӵ��������ж��صļ��⣬���Լ�ͥҽҩ������Ȥ�������Դ����ϱ���Ĳ��顣�����̵���Ϊ�;����ڼ����о����쵼���ã������ռǺ�����¼��ϸ��¼�������쵼���������Կ���[Data: Entities (282), Relationships (639, 1316, 1323, 1317, 1320, 1325, 1321, 1318, 1324, 1322)]��"
        },
        {
            "summary": "�����������еĽ�ɫ",
            "explanation": "���в����Ǽ����Ա�����ĵص㣬Ҳ�Ǵ�������׺ʹ�������ġ����轥�����й�������ʾ������ְҵ��ݺ������еĹ�ϵ�����л��漰�������Ա�ľ��þ��ߣ��緽�轥�ƻ�ȥ����֧���˵�����ʾ���Ĳ���������ͥ����״���й�[Data: Entities (122), Relationships (681, 1075, 276, 1073, 1079, 1071, 1063, 1076, 1074, 1077, 1078)]��"
        },
        {
            "summary": "�����ڲ��Ľ�����������ͳ",
            "explanation": "�����̶����ӵ������ж��صļ��⣬���Ϊ����ȡ��Ϊ���ǹ����������ù��������������塣����������ͳ��ӳ�˼���Խ��������ӺͶԴ�ͳ�Ļ������ء������̵���Ϊ�;����ڼ����о����쵼���ã������ռǺ�����¼��ϸ��¼�������쵼���������Կ���[Data: Entities (282), Relationships (1316, 1321, 1318, 1320)]��"
        },
        {
            "summary": "�����Ա���ⲿʵ��Ļ���",
            "explanation": "���轥�����еĹ�ϵ�������ڹ��������������ⲿʵ��Ļ�����������С����ż����������ֻ�����ʾ�˼����Ա�������е��罻��ְҵ���硣������Ϊ�������ģ�������Ա�����еĹ����Ͳ�����ϵ[Data: Entities (122), Relationships (681, 276, 1074)]��"
        },
        {
            "summary": "�����Ա�ľ��þ���",
            "explanation": "�������ھ��þ����ϱ��ֳ��������������ѵ��Ϻ��󣬲�ԸΪ���ӹ�Ӷ��ĸ�����־��þ��߷�ӳ�˼������ض������µ���Ӧ�ԺͶ���Դ�ĺ������á�������Ϊ�������ģ�������Ա�����еĹ����Ͳ�����ϵ[Data: Entities (282), Relationships (449, 540, 454, 1071, 1063, 1076, 1074, 1077, 1078)]��"
        }
    ]
}
      
Traceback (most recent call last):
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\openai\utils.py", line 93, in try_parse_json_object
    result = json.loads(input)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\json\__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\json\decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
10:54:04,391 graphrag.index.graph.extractors.community_reports.community_reports_extractor ERROR error generating community report
Traceback (most recent call last):
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\index\graph\extractors\community_reports\community_reports_extractor.py", line 58, in __call__
    await self._llm(
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\openai\json_parsing_llm.py", line 34, in __call__
    result = await self._delegate(input, **kwargs)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\openai\openai_token_replacing_llm.py", line 37, in __call__
    return await self._delegate(input, **kwargs)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\openai\openai_history_tracking_llm.py", line 33, in __call__
    output = await self._delegate(input, **kwargs)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\base\caching_llm.py", line 104, in __call__
    result = await self._delegate(input, **kwargs)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\base\rate_limiting_llm.py", line 177, in __call__
    result, start = await execute_with_retry()
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\base\rate_limiting_llm.py", line 159, in execute_with_retry
    async for attempt in retryer:
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\tenacity\asyncio\__init__.py", line 166, in __anext__
    do = await self.iter(retry_state=self._retry_state)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\tenacity\asyncio\__init__.py", line 153, in iter
    result = await action(retry_state)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\tenacity\_utils.py", line 99, in inner
    return call(*args, **kwargs)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\tenacity\__init__.py", line 398, in <lambda>
    self._add_action_func(lambda rs: rs.outcome.result())
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\concurrent\futures\_base.py", line 451, in result
    return self.__get_result()
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\concurrent\futures\_base.py", line 403, in __get_result
    raise self._exception
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\base\rate_limiting_llm.py", line 165, in execute_with_retry
    return await do_attempt(), start
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\base\rate_limiting_llm.py", line 147, in do_attempt
    return await self._delegate(input, **kwargs)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\base\base_llm.py", line 48, in __call__
    return await self._invoke_json(input, **kwargs)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\openai\openai_chat_llm.py", line 82, in _invoke_json
    result = await generate()
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\openai\openai_chat_llm.py", line 74, in generate
    await self._native_json(input, **{**kwargs, "name": call_name})
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\openai\openai_chat_llm.py", line 108, in _native_json
    json_output = try_parse_json_object(raw_output)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\openai\utils.py", line 93, in try_parse_json_object
    result = json.loads(input)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\json\__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\json\decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
10:54:04,394 graphrag.index.reporting.file_workflow_callbacks INFO Community Report Extraction Error details=None
10:54:04,394 graphrag.index.verbs.graph.report.strategies.graph_intelligence.run_graph_intelligence WARNING No report found for community: 71

Jul 08 '24 07:07 Pandas886

I also tried using Chinese text, and it generated normally with UTF-8 characters. Entities and relationships were also generated correctly, including the graph. However, the Chinese characters in the process are in Unicode format. I hope this can be optimized to normal characters, as it appears to be a character encoding warning.

Jul 08 '24 09:07 zhouxihong1

yeah, I am able to do Chinese network novel. You can refer the document in my weixin gongzhonghao .

我用网络小说仙逆做的可以成功的。

Jul 08 '24 23:07 KylinMountain

yeah, I am able to do Chinese network novel. You can refer the document in my weixin gongzhonghao .

我用网络小说仙逆做的可以成功的。

请问一下您知道这个报错怎么解决嘛

Jul 09 '24 03:07 xxWeiDG

same here. It appears randomly.

Jul 09 '24 15:07 sipie800

下您知道这个报错怎么解决嘛

查看下你的logs，可能是大模型 Error Invoking LLM 导致ReadTimeout，最终报KeyError: 'community'

Jul 19 '24 02:07 Lincolnwill

Consolidating language support issues here: #696

Jul 25 '24 00:07 natoverse

下您知道这个报错怎么解决嘛

查看下你的logs，可能是大模型 Error Invoking LLM 导致ReadTimeout，最终报KeyError: 'community'

我就是在create_final_entities的Verb text_embed过程中报错超时，请问您是怎么解决的？

Sep 03 '24 06:09 1249815869

下您知道这个报错怎么解决嘛

查看下你的logs，可能是大模型 Error Invoking LLM 导致ReadTimeout，最终报KeyError: 'community'

我就是在create_final_entities的Verb text_embed过程中报错超时，请问您是怎么解决的？

i have the same error, have u resolve it?

Sep 18 '24 08:09 rmd1710714107

下您知道这个报错怎么解决嘛

查看下你的logs，可能是大模型 Error Invoking LLM 导致ReadTimeout，最终报KeyError: 'community'

我就是在create_final_entities的Verb text_embed过程中报错超时，请问您是怎么解决的？

i have the same error, have u resolve it?

我试着调整了setting.yaml文件中的embeddings部分。先用fastchat启动一个embedding模型的服务，参考代码如下：

# 启动controller 
python -m fastchat.serve.controller --host 127.0.0.1 --port 21001

# 启动model_worker
python -m fastchat.serve.model_worker --device cpu --model-names bge-m3 --model-path D:/Models/embedding/bge-m3 --controller-address http://127.0.0.1:21001 --worker-address http://127.0.0.1:8080 --host 0.0.0.0 --port 8080 

# 启动服务
python -m fastchat.serve.openai_api_server --host 0.0.0.0 --port 9000

setting.ymal参考配置如下：

Sep 20 '24 01:09 1249815869