fix(builder): handle both dict and string formats for core_entities in knowledge unit extractor
๐ฏ Summary
This PR fixes the AttributeError: 'dict' object has no attribute 'split' error that occurs during knowledge extraction, as reported in issue #714.
๐ Problem
The knowledge unit extractor was failing when processing core_entities field that comes in dictionary format from the LLM, because the code only expected string format and attempted to call .split(",") on the value.
Root Cause
The LLM can return core_entities in two different formats depending on the language and prompt:
- String format (Chinese example):
"ๆ ธๅฟๅฎไฝ": "็ซ็ตๅ็ต้,ๅๆฏๅข้ฟ็,2019ๅนด" - Dict format (English example):
"Core Entities": {"T.I.": "Person", "No Mediocre": "Culture and Entertainment"}
The code at kag/builder/component/extractor/knowledge_unit_extractor.py:587 only handled the string format:
for item in knowledge_value.get("core_entities", "").split(","):
# This fails when core_entities is a dict!
Error Stack Trace
AttributeError: 'dict' object has no attribute 'split'
File "/kag/builder/component/extractor/knowledge_unit_extractor.py", line 587, in assemble_knowledge_unit
for item in knowledge_value.get("core_entities", "").split(","):
โ Solution
Modified the assemble_knowledge_unit method in knowledge_unit_extractor.py to handle both formats gracefully:
core_entities_raw = knowledge_value.get("core_entities", "")
# Handle both string and dict formats for core_entities
if isinstance(core_entities_raw, dict):
# Dict format: {entity_name: entity_type}
core_entities = core_entities_raw
elif isinstance(core_entities_raw, str):
# String format: comma-separated values
for item in core_entities_raw.split(","):
if not item.strip():
continue
core_entities[item.strip()] = "Others"
else:
# Handle unexpected types gracefully with logging
logger.warning(
f"Unexpected type for core_entities: {type(core_entities_raw)}, "
f"expected str or dict. Value: {core_entities_raw}"
)
๐งช Testing
-
Experiment Scripts: Created comprehensive test scripts in
experiments/directory to verify the fix handles all scenarios:- String format (Chinese)
- Dict format (English)
- Empty strings
- Missing fields
- Invalid types (with proper logging)
-
Unit Tests: Added
test_knowledge_unit_core_entities.pywith comprehensive test coverage for all core_entities formats -
Code Quality: All changes pass flake8 validation
๐ Changes
- Modified:
kag/builder/component/extractor/knowledge_unit_extractor.py- Added type checking and handling for both dict and string formats - Added:
tests/unit/builder/component/test_knowledge_unit_core_entities.py- Unit tests for the fix - Added:
experiments/test_core_entities_handling.py- Experiment script demonstrating the issue and fix - Added:
experiments/test_fix.py- Verification script for all scenarios
๐ Related Issues
Fixes #714
๐ค Generated with Claude Code
๐ค Solution Draft Log
This log file contains the complete execution trace of the AI solution draft process.
๐ Log file uploaded as GitHub Gist (359KB) ๐ View complete solution draft log
Now working session is ended, feel free to review and add any feedback on the solution draft.