langextract ResolverParsingError ("Content must contain an 'extractions' key.")

LangExtract: model=gpt-5, current=9,585 chars, processed=9,585 chars: [00:00]ERROR:absl:Content does not contain 'extractions' key. LangExtract: model=gpt-5, current=9,585 chars, processed=9,585 chars: [01:19] Traceback (most recent call last): File "/Users/prit/Documents/GitHub/Ramco/SB2ED/venv/lib/python3.13/site-packages/langextract/resolver.py", line 222, in resolve extraction_data = self.string_to_extraction_data(input_text) File "/Users/prit/Documents/GitHub/Ramco/SB2ED/venv/lib/python3.13/site-packages/langextract/resolver.py", line 388, in string_to_extraction_data raise ResolverParsingError("Content must contain an 'extractions' key.") langextract.resolver.ResolverParsingError: Content must contain an 'extractions' key.

With many model apart from Gemini I am facing this issue.

Aug 13 '25 08:08 pritkudale

i also got this even with gemini-2.5-{flash,pro}. although it took much longer before it threw the exception relative to smaller local models. some more detail:

when debugging i saw that it was producing extractions (following the examples shape) but they were individual / top-level rather than grouped under the extractions key list. it was not clear why they must be grouped this way.

this did not happen with the R&J examples regardless of model, only with my custom examples and source docs.

Aug 13 '25 14:08 the-vampiire

Hi @pritkudale,

Thanks for reporting this. Gemini models implement controlled generation to ensure schema compliance, which is why they work more reliably.

This should improve as more models add structured output support (#99). GPT-5 and others will likely have similar capabilities as the community implements plugins with schema support.

Thanks for your patience while broader model support rolls out.

Related: #128, #127

Aug 14 '25 05:08 aksg87

I am using Gemma 3 models and getting the same error - curious what is the timeline for Gemma support? Thank you!

Aug 26 '25 02:08 skyrockets-21

Hi @aksg87,

I've found this issue occurs commonly using Gemma3 locally. It appears related to how LangExtract chunks the text for the LLM to process. If the chunks ends up not containing any of the entities of interest, the LLM sends a polite refusal message (as trained to do).

This is not valid JSON and lx.extract rejects the whole job even if the rest of the chunks were valid. This commonly happens at the start or end of a document and is more likely with smaller contexts. This is frustrating as the very features that make LangExtract compelling are the source of this error. It is also not a very transparent error, so it took a while to find the route cause.

Why doesn't lx.extract just give a warning rather then exiting? One idea would be if the LLM response is short and not valid JSON replace it with an empty extract JSON, so the function doesn't fail. Equally you could mark the span in the JSONL file as not containing relevant extraction entities (in debug mode?).

Sep 05 '25 10:09 thekitchenscientist

it would be great to have a warning-only mode or at least a way to preserve the existing extractions when crashing. i had a run that went on 20 mins and only near the end did it fail, lost everything before then.

Sep 05 '25 15:09 the-vampiire

are there any open-source models that work well and tend to throw this error less? I have been trying to use qwen3:14b and it doesn't work.

Sep 05 '25 23:09 prtm1908

I guess that because some LLM model responses contain a ... span, the extraction process cannot handle the invalid JSON. Could you provide a custom format method that can be injected into the process?

Sep 22 '25 07:09 idoku

phi4 works pretty well

Nov 27 '25 01:11 rsov