[Serious bug] text files that do not support Chinese content
I attempted to conduct an RAG test using Qian Zhongshu's "Fortress Besieged" and encountered the following errors.
the pipeline msg:
❌ create_final_community_reports
None
⠋ GraphRAG Indexer
├── Loading Input (text) - 1 files loaded (0 filtered) ----- 100% 0:00:… 0:00:…
├── create_base_text_units
├── create_base_extracted_entities
├── create_summarized_entities
├── create_base_entity_graph
├── create_final_entities
├── create_final_nodes
├── create_final_communities
├── join_text_units_to_entity_ids
├── create_final_relationships
├── join_text_units_to_relationship_ids
└── create_final_community_reports❌ Errors occurred during the pipeline run, see logs for more details.
View logs like below:
10:54:04,391 graphrag.llm.openai.utils ERROR error loading json, json=```json
{
"title": "�����̼�������������",
"summary": "�������Է����̼���Ϊ���ģ��漰���м��������ʵ�塣��������Ϊ���峤�����Լ�ͥ���������Ӱ�죬������������ҽҩ���档������Ϊ�������ģ�������Ա�����еĹ����Ͳ�����ϵ�������ڵĹ�ϵ���ӣ��漰�����ڲ��Ľ��������þ����Լ����ⲿʵ��Ļ�����",
"rating": 6.5,
"rating_explanation": "��������Ӱ������Ϊ�е�ƫ�ϣ���Ҫ��Ϊ�����ڲ��Ľ����;��þ��߿��ܶ����������������Ӧ��",
"findings": [
{
"summary": "�������ڼ����еĺ��ĵ�λ",
"explanation": "��������Ϊ�����еij������Լ�ͥ��������Զ��Ӱ�졣�����������ӵ��������ж��صļ��⣬���Լ�ͥҽҩ������Ȥ�������Դ����ϱ���IJ��顣�����̵���Ϊ�;����ڼ����о����쵼���ã������ռǺ�����¼��ϸ��¼�������쵼���������Կ���[Data: Entities (282), Relationships (639, 1316, 1323, 1317, 1320, 1325, 1321, 1318, 1324, 1322)]��"
},
{
"summary": "�����������еĽ�ɫ",
"explanation": "���в����Ǽ����Ա�����ĵص㣬Ҳ�Ǵ�������ʹ�������ġ����轥�����й�������ʾ������ְҵ��ݺ������еĹ�ϵ�����л��漰�������Ա�ľ��þ��ߣ��緽�轥�ƻ�ȥ����֧���˵�����ʾ���IJ���������ͥ����״���й�[Data: Entities (122), Relationships (681, 1075, 276, 1073, 1079, 1071, 1063, 1076, 1074, 1077, 1078)]��"
},
{
"summary": "�����ڲ��Ľ�����������ͳ",
"explanation": "�����̶����ӵ������ж��صļ��⣬���Ϊ����ȡ��Ϊ���ǹ����������ù��������������塣����������ͳ��ӳ�˼���Խ��������ӺͶԴ�ͳ�Ļ������ء������̵���Ϊ�;����ڼ����о����쵼���ã������ռǺ�����¼��ϸ��¼�������쵼���������Կ���[Data: Entities (282), Relationships (1316, 1321, 1318, 1320)]��"
},
{
"summary": "�����Ա���ⲿʵ��Ļ���",
"explanation": "���轥�����еĹ�ϵ�������ڹ��������������ⲿʵ��Ļ�����������С����ż����������ֻ�����ʾ�˼����Ա�������е��罻��ְҵ���硣������Ϊ�������ģ�������Ա�����еĹ����Ͳ�����ϵ[Data: Entities (122), Relationships (681, 276, 1074)]��"
},
{
"summary": "�����Ա�ľ��þ���",
"explanation": "�������ھ��þ����ϱ��ֳ��������������ѵ��Ϻ���ԸΪ���ӹ�Ӷ��ĸ�����־��þ��߷�ӳ�˼������ض������µ���Ӧ�ԺͶ���Դ�ĺ������á�������Ϊ�������ģ�������Ա�����еĹ����Ͳ�����ϵ[Data: Entities (282), Relationships (449, 540, 454, 1071, 1063, 1076, 1074, 1077, 1078)]��"
}
]
}
Traceback (most recent call last):
File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\openai\utils.py", line 93, in try_parse_json_object
result = json.loads(input)
File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\json\__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
10:54:04,391 graphrag.index.graph.extractors.community_reports.community_reports_extractor ERROR error generating community report
Traceback (most recent call last):
File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\index\graph\extractors\community_reports\community_reports_extractor.py", line 58, in __call__
await self._llm(
File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\openai\json_parsing_llm.py", line 34, in __call__
result = await self._delegate(input, **kwargs)
File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\openai\openai_token_replacing_llm.py", line 37, in __call__
return await self._delegate(input, **kwargs)
File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\openai\openai_history_tracking_llm.py", line 33, in __call__
output = await self._delegate(input, **kwargs)
File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\base\caching_llm.py", line 104, in __call__
result = await self._delegate(input, **kwargs)
File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\base\rate_limiting_llm.py", line 177, in __call__
result, start = await execute_with_retry()
File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\base\rate_limiting_llm.py", line 159, in execute_with_retry
async for attempt in retryer:
File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\tenacity\asyncio\__init__.py", line 166, in __anext__
do = await self.iter(retry_state=self._retry_state)
File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\tenacity\asyncio\__init__.py", line 153, in iter
result = await action(retry_state)
File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\tenacity\_utils.py", line 99, in inner
return call(*args, **kwargs)
File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\tenacity\__init__.py", line 398, in <lambda>
self._add_action_func(lambda rs: rs.outcome.result())
File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\concurrent\futures\_base.py", line 451, in result
return self.__get_result()
File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\concurrent\futures\_base.py", line 403, in __get_result
raise self._exception
File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\base\rate_limiting_llm.py", line 165, in execute_with_retry
return await do_attempt(), start
File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\base\rate_limiting_llm.py", line 147, in do_attempt
return await self._delegate(input, **kwargs)
File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\base\base_llm.py", line 48, in __call__
return await self._invoke_json(input, **kwargs)
File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\openai\openai_chat_llm.py", line 82, in _invoke_json
result = await generate()
File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\openai\openai_chat_llm.py", line 74, in generate
await self._native_json(input, **{**kwargs, "name": call_name})
File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\openai\openai_chat_llm.py", line 108, in _native_json
json_output = try_parse_json_object(raw_output)
File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\site-packages\graphrag\llm\openai\utils.py", line 93, in try_parse_json_object
result = json.loads(input)
File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\json\__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Users\10400\miniconda3\envs\graphrag_test\lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
10:54:04,394 graphrag.index.reporting.file_workflow_callbacks INFO Community Report Extraction Error details=None
10:54:04,394 graphrag.index.verbs.graph.report.strategies.graph_intelligence.run_graph_intelligence WARNING No report found for community: 71
I also tried using Chinese text, and it generated normally with UTF-8 characters. Entities and relationships were also generated correctly, including the graph. However, the Chinese characters in the process are in Unicode format. I hope this can be optimized to normal characters, as it appears to be a character encoding warning.
yeah, I am able to do Chinese network novel. You can refer the document in my weixin gongzhonghao .
我用网络小说仙逆做的 可以成功的。
yeah, I am able to do Chinese network novel. You can refer the document in my weixin gongzhonghao .
我用网络小说仙逆做的 可以成功的。
请问一下您知道这个报错怎么解决嘛
same here. It appears randomly.
下您知道这个报错怎么解决嘛
查看下你的logs,可能是大模型 Error Invoking LLM 导致ReadTimeout,最终报KeyError: 'community'
Consolidating language support issues here: #696
下您知道这个报错怎么解决嘛
查看下你的logs,可能是大模型 Error Invoking LLM 导致ReadTimeout,最终报KeyError: 'community'
我就是在create_final_entities的Verb text_embed过程中报错超时,请问您是怎么解决的?
下您知道这个报错怎么解决嘛
查看下你的logs,可能是大模型 Error Invoking LLM 导致ReadTimeout,最终报KeyError: 'community'
我就是在create_final_entities的Verb text_embed过程中报错超时,请问您是怎么解决的?
i have the same error, have u resolve it?
下您知道这个报错怎么解决嘛
查看下你的logs,可能是大模型 Error Invoking LLM 导致ReadTimeout,最终报KeyError: 'community'
我就是在create_final_entities的Verb text_embed过程中报错超时,请问您是怎么解决的?
i have the same error, have u resolve it?
我试着调整了setting.yaml文件中的embeddings部分。 先用fastchat启动一个embedding模型的服务,参考代码如下:
# 启动controller
python -m fastchat.serve.controller --host 127.0.0.1 --port 21001
# 启动model_worker
python -m fastchat.serve.model_worker --device cpu --model-names bge-m3 --model-path D:/Models/embedding/bge-m3 --controller-address http://127.0.0.1:21001 --worker-address http://127.0.0.1:8080 --host 0.0.0.0 --port 8080
# 启动服务
python -m fastchat.serve.openai_api_server --host 0.0.0.0 --port 9000
setting.ymal参考配置如下: