json.decoder.JSONDecodeError: Unterminated string starting at
Hello.
I get json parse errors from resolver.py due to propably wrong structured output from llm
code:
try:
result_generator = lx.extract(
text_or_documents=comment_texts,
prompt_description=topic_prompt,
examples=topic_examples,
model_id="gemini-2.5-flash",
debug=True,
}
except Exception as e:
print(f"Caught a parsing error: {e}")
print(f"Extracted {len(result_generator.str)} entities from {len(result_generator.text):,} characters")
#trying to save output for debug
with open("jsondebugtest.jsonl", "w") as f:
for result in result_generator:
f.write(result.text)
f.write("\n")
error trace:
`LangExtract: model=gemini-2.5-flash, current=322 chars, processed=322 chars: [00:00]ERROR:absl:Failed to parse content.
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/langextract/resolver.py", line 349, in _extract_and_parse_content
parsed_data = json.loads(content)
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/json/decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
^^^^^^^^^^^^^^^^^^^^^^
json.decoder.JSONDecodeError: Unterminated string starting at: line 4 column 16 (char 42)
LangExtract: model=gemini-2.5-flash, current=322 chars, processed=322 chars: [04:42]Caught a parsing error: Failed to parse content.
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
[/tmp/ipython-input-1112811029.py](https://localhost:8080/#) in <cell line: 0>()
19 print(f"Caught a parsing error: {e}")
20
---> 21 print(f"Extracted {len(result_generator.str)} entities from {len(result_generator.text):,} characters")
22 with open("jsondebugtest.jsonl", "w") as f:
23 for result in result_generator:
NameError: name 'result_generator' is not defined`
Is there any way to output raw LLM output /inference api response before its go to resolver.py ? its hard to get proper prompt and proper structured output without easy way to debugging llm output
What you said is indeed correct. To add an additional parameter, we can set whether to allow the original LLM response to be displayed first in the middle. This might be helpful for debugging or identifying the root cause of the problem (such as verifying if the LLM is outputting in the JSON format you desire). Note that when deploying a local LLM using ollama, your local LLM needs to be parameterized to support JSON output, so that it can be parsed by the JSON parser of lx. If the output is not in JSON format, it will indicate a parsing failure. It is also worth noting that for some models, you can instruct them to output in JSON format in the prompt. However, this method is not 100% effective and thus may cause some parsing failures. A better solution is to switch to a model supported by Ollama that can output in JSON format.
You can try add parameter fence_output=True, when calling extract method. It may help.
I am using Gemma models locally, which does not support structured output (i.e., it will generate raw JSON text outputs).
To overcome this error, I placed further instructions in the prompt as shown in the example below:
Return your answer as a JSON object with this format:
{
"extractions": [
{
"extraction_class": "exclusion",
"extraction_text": "exact text from the policy document",
"attributes": {...}
}
]
}
As a result, I no longer face the same issue.
Hi @wisteriesDev,
This parsing error issue should be addressed by PR #239 which introduces a centralized FormatHandler for consistent parsing across all providers. The PR includes proper fence detection and fallback mechanisms to handle various output formats, including edge cases like missing or malformed JSON/YAML structures.
If parsing errors persist after the PR is merged, please reopen with specific examples and reproduction steps.
Thank you for reporting this issue and helping improve LangExtract!