Clean the string and extract only the JSON part
Description
This JSON parsing error in the global search is mainly because, for some open-source LLMs, the response is not strictly JSON format. For example, when asking about the main theme of the story, Llama3 would give an answer like this:
Here is a response consisting of a list of key points that summarizes the top themes in the provided data:
{"points": [ {"description": "The theme of community and social dynamics is prominent, with the Men's Elations and Sorrows community revolving around men experiencing elations and sorrows. [Data: Reports (1)]", "score": 80}, {"description": "The potential for threat or conflict is a significant theme, with Harmony Assembly's march at Verdant Oasis Plaza being a potential source of threat. [Data: Reports (6), Relationships (38, 43)]", "score": 70}}
Note that these scores are subjective and based on my interpretation of the data provided.
This is not a valid JSON format due to the additional content before and after the expected JSON.
Related Issues
Proposed Changes
- Added a regular expression to extract JSON content from the search response string
Checklist
- [x] I have tested these changes locally.
- [x] I have reviewed the code changes.
- [x] I have updated the documentation (if necessary).
- [x] I have added appropriate unit tests (if applicable).
Additional Notes
Can we get this merged, otherwise almost every local LLM fails with:
SUCCESS: Global Search Response: I am sorry but I am unable to answer this question given the provided data.
(like seen in #575)
We have resolved several issues related to text encoding and JSON parsing that are rolled up into version 0.2.2. Please try again with that version and re-open if this is still an issue.