graphrag
graphrag copied to clipboard
JSON parsing: always fix all incoming json when using _manual_json
Description
This change set adds non-LLM-based JSON malformity handling as a preliminary step before using the more resource-intensive LLM-based fixup.
More Details
While running GraphRAG with a local Ollama model, I noticed frequent malformed JSON responses from LLM requests, significantly slowing down the process on an M1 Max MacBook. In a fast, parallel cloud inference system, this issue is manageable, but locally it becomes a bottleneck. After indexing, I found 140 instances of JSON parsing failures.
The json_repair
library effectively fixed the malformed JSON in my tests. I opted not to delve into the specific parsing failure cases, as they are mainly LLM-related and predicting every edge case is impractical. This library should be robust enough to handle most local LLM faults.
Related Issues
- https://github.com/microsoft/graphrag/issues/345
Proposed Changes
- Add
json_repair
as a new Poetry dependency.- https://pypi.org/project/json-repair/
- Use
json_repair
for initial JSON repair in_manual_json
, with a fallback to the LLM. - Apply JSON repair when graph search JSON parsing fails.
Checklist
- [x] Tested these changes locally.
- [x] Reviewed the code changes.
- [x] Updated documentation (if necessary).
- [ ] Added appropriate unit tests (if applicable).