知识抽取出现问题了,怎么解决
Traceback (most recent call last): File "/mnt/SSD/home/zxy24/KAG/kag/builder/prompt/default/util.py", line 180, in check_data info = json.loads( File "/mnt/SSD/home/zxy24/anaconda3/envs/openspg/lib/python3.10/json/init.py", line 346, in loads return _default_decoder.decode(s) File "/mnt/SSD/home/zxy24/anaconda3/envs/openspg/lib/python3.10/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/mnt/SSD/home/zxy24/anaconda3/envs/openspg/lib/python3.10/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 2 (char 1) 2025-10-16 21:34:50 - ERROR - root - Failed to process data {'id': 'Alû', 'name': 'Alû', 'content': 'In Akkadia'}, info: Traceback (most recent call last): File "/mnt/SSD/home/zxy24/KAG/kag/builder/runner.py", line 207, in process result = await self.chain.ainvoke(data) File "/mnt/SSD/home/zxy24/KAG/kag/interface/builder/builder_chain_abc.py", line 164, in ainvoke outputs = await asyncio.gather(*tasks) File "/mnt/SSD/home/zxy24/KAG/kag/interface/builder/builder_chain_abc.py", line 134, in execute_node results = await asyncio.gather(*tasks) File "/mnt/SSD/home/zxy24/KAG/kag/interface/builder/builder_chain_abc.py", line 126, in ainvoke_with_semaphore return await node.ainvoke(item) File "/mnt/SSD/home/zxy24/KAG/kag/interface/builder/base.py", line 215, in ainvoke output = await self._ainvoke(input_data, **kwargs) File "/mnt/SSD/home/zxy24/KAG/kag/builder/component/extractor/knowledge_unit_extractor.py", line 671, in _ainvoke knowledge_unit_nodes = self.assemble_knowledge_unit( File "/mnt/SSD/home/zxy24/KAG/kag/builder/component/extractor/knowledge_unit_extractor.py", line 587, in assemble_knowledge_unit for item in knowledge_value.get("core_entities", "").split(","): AttributeError: 'dict' object has no attribute 'split' 100%|██████████████████████████████████████████████████| 8/8 [02:57<00:00, 22.23s/it] Done process 8 records, with 0 found in checkpoint, 0 successfully processed and 8 failures encountered. The log file is located at ckpt/kag_checkpoint_0_1.ckpt. Please access this file to obtain detailed task statistics. 2025-10-16 21:34:50 - INFO - main -
buildKB successfully for /mnt/SSD/home/zxy24/KAG/kag/examples/HotpotQATest/builder/data/sub_corpus.json
JSONDecodeError("Expecting value", s, err.value) from None,可能是模型没连上,调用模型的时候没有得到数据,也可能是提示词等问题导致模型返回的数据有问题,检查一下能不能连上模型
🎯 Solution Implemented
I've analyzed and fixed this issue. The problem was that the core_entities field returned by the LLM can come in two different formats:
- String format (typically Chinese):
"核心实体": "火电发电量,同比增长率,2019年" - Dict format (typically English):
"Core Entities": {"T.I.": "Person", "No Mediocre": "Culture and Entertainment"}
The code was only handling the string format and trying to call .split(",") on the value, which caused the AttributeError: 'dict' object has no attribute 'split'.
Fix Applied
Modified kag/builder/component/extractor/knowledge_unit_extractor.py to handle both formats gracefully with proper type checking and error logging.
Pull Request
The fix has been submitted in PR #717. It includes:
- ✅ Type-safe handling of both dict and string formats
- ✅ Comprehensive unit tests
- ✅ Experiment scripts demonstrating the fix
- ✅ All code quality checks passing (flake8, black)
The PR is ready for review: https://github.com/OpenSPG/KAG/pull/717
我发现是模型输出结构不符合 KAG 预期格式,尝试使用如下方法也可解决 修改kag_config.yaml extractor: type: knowledge_unit_extractor llm: *openie_llm
ner_prompt:
type: knowledge_unit_ner
prompt: |
You are an information extraction assistant.
Extract NER results strictly as a dictionary of strings.
All fields MUST be strings, not lists or objects.
Output JSON format:
{
"entities": "EntityA, EntityB",
"core_entities": "Entity1, Entity2"
}
Text:
{text}
triple_prompt:
type: knowledge_unit_triple
prompt: |
Extract triples strictly as comma-separated strings.
REQUIREMENTS:
- ALL fields MUST be strings.
- NEVER output arrays or objects.
EXAMPLE CORRECT:
{
"entities": "A, B, C",
"relations": "r1, r2",
"core_entities": "X, Y",
"summary": "..."
}
Text:
{text}
kn_prompt:
type: knowledge_unit
prompt: |
You are an information extraction model.
Extract the knowledge unit from the text.
REQUIREMENTS:
- All fields MUST be strings.
- DO NOT return arrays or objects.
- DO NOT return dictionaries.
- If multiple items, join them with commas.
- core_entities MUST be a comma-separated STRING, not a JSON object.
STRICT OUTPUT FORMAT (copy exactly):
{
"core_entities": "Entity1, Entity2",
"summary": "One sentence summary",
"entities": "EntityA, EntityB",
"relations": "Relation1, Relation2"
}
EXAMPLE OF WRONG OUTPUT (NEVER DO THIS):
{
"core_entities": {"A": "Type1", "B": "Type2"} <-- wrong
}
EXAMPLE OF CORRECT OUTPUT:
{
"core_entities": "A, B"
}
Text:
{text}
更新knowledge_unit_extractor.py vi /root/KAG/kag/builder/component/extractor/knowledge_unit_extractor.py def assemble_knowledge_unit(...): knowledge_unit_nodes = [] knowledge_units = dict(input_knowledge_units)
--- BEGIN: Fix Qwen output core_entities being dict ---
for k, v in knowledge_units.items(): core = v.get("core_entities") if isinstance(core, dict): v["core_entities"] = ", ".join(core.keys())
--- END: Fix Qwen output core_entities being dict ---
def triple_to_knowledge_unit(triple): ...