KAG 使用医疗模板实现金融领域KAG，xlsx中包括7篇文章，但是indexer.py后只抽取并写入成功两篇，剩下5篇写入spo和chunk失败，请问怎么解决

Search before asking

[X] I had searched in the issues and found no similar issues.

Operating system information

Linux

What happened

Traceback (most recent call last): File "/opt/conda/lib/python3.9/site-packages/kag/builder/component/extractor/kag_extractor.py", line 321, in invoke sub_graph, entities = self.assemble_sub_graph_with_spg_records(entities) File "/opt/conda/lib/python3.9/site-packages/kag/builder/component/extractor/kag_extractor.py", line 136, in assemble_s if prop_name in spg_type.properties: AttributeError: 'NoneType' object has no attribute 'properties' INFO:kag.builder.component.extractor.kag_extractor: NoneType' object has no attribute 'properties' 使用医疗模板实现金融领域KAG，xlsx中包括7篇文章，但是indexer.py后只抽取并成功写入两篇，剩下5篇写入spo和chunk失败，后续重新抽取5篇文章，以一篇部分内容实现依旧没有写入本地知识库，抽取时会出现以上错误，回答问题时sub_answer: I don't know; docs: []; spo_retrieved:[]; exactly_match: False. 同时还会出现 query chunk failed: (400) , knext.common.rest.exceptions.ApiException: (400), Failed to invoke procedure 'db,index.fulltext.quermodes : caused by: org.apache.lucene.search.Indexsearcher$Tolanyclauses: maxclausecount is set to 1024

How to reproduce

代码没有开源，希望您根据我这边的问题，提出一个解决方案或者思路，拜托了

Are you willing to submit PR?

[ ] Yes I am willing to submit a PR!

Dec 18 '24 13:12 JaDonghao

换个大模型试试

Dec 19 '24 01:12 thundax-lyp

您好，我这边突然发现之前成功的案例也出现抽取写入失败的问题。操作：重启了一下那三个服务mysql、neo4j、server后使用一篇做实验，可以成功抽取并写入了。问题：请问这三个服务有什么限制吗？为什么之前xlsx中有两篇成功，其他五篇失败呀，xlsx中content文本有字数限制吗？我这边七篇文章，平均20000字，是不是字数限制导致的问题呀？需要对content里的内容进行预处理吗？比如删除多余空行。非常期待您的回复！

---- 回复的原邮件 ---- | 发件人 | @.> | | 日期 | 2024年12月19日 09:59 | | 收件人 | @.> | | 抄送至 | @.>@.> | | 主题 | Re: [OpenSPG/KAG] 使用医疗模板实现金融领域KAG，xlsx中包括7篇文章，但是indexer.py后只抽取并写入成功两篇，剩下5篇写入spo和chunk失败，请问怎么解决 (Issue #143) |

换个大模型试试

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

Dec 19 '24 03:12 JaDonghao

您好，我这边突然发现之前成功的案例也出现抽取写入失败的问题。操作：重启了一下那三个服务mysql、neo4j、server后使用一篇做实验，可以成功抽取并写入了。问题：请问这三个服务有什么限制吗？为什么之前xlsx中有两篇成功，其他五篇失败呀，xlsx中content文本有字数限制吗？我这边七篇文章，平均20000字，是不是字数限制导致的问题呀？需要对content里的内容进行预处理吗？比如删除多余空行。非常期待您的回复！ …

请问您这边处理这种有结果了吗？我现在也有一批分析数据想要处理，导出成csv之后效果也不好，分析不出来。

Jan 26 '25 11:01 zjzjzjzj1874

您好，我这边突然发现之前成功的案例也出现抽取写入失败的问题。操作：重启了一下那三个服务mysql、neo4j、server后使用一篇做实验，可以成功抽取并写入了。问题：请问这三个服务有什么限制吗？为什么之前xlsx中有两篇成功，其他五篇失败呀，xlsx中content文本有字数限制吗？我这边七篇文章，平均20000字，是不是字数限制导致的问题呀？需要对content里的内容进行预处理吗？比如删除多余空行。非常期待您的回复！ …

LLM may return a non-standard JSON format with a certain probability, which may cause parsing errors. You can rerun the program a few times to resolve the error. Successful cases will be written into ckpt and will be skipped during the rerun phase.

Apr 18 '25 07:04 caszkgui