langextract icon indicating copy to clipboard operation
langextract copied to clipboard

ExampleData creeping into final result

Open skasiai opened this issue 3 months ago • 3 comments

I noticed that the example data I had provided showed up in the output when extracting data from a completely different text block.

Unfortunately I did not capture the data to share here, but I will try to repro it and share the data. This is a major issue IMO, I was evaluating to use this in my project but with this issue correctness is unreliable.

skasiai avatar Sep 15 '25 17:09 skasiai

Hi @skasiai, thanks for the comment. Definitely share more details if possible so the community can better understand. Few suggestions: I would tune the instructions and the balance between example content and the content you are processing. Also, try testing on Gemini 2.5 Pro or another strong model as a starting point. I used to notice this type of issue with models a year ago, but it's less common in my experience with the latest models and sufficient prompt engineering.

aksg87 avatar Sep 16 '25 03:09 aksg87

@aksg87 this happens also with gpt-4o . I guess it will also happen on smaller and weaker models. Could the framework just clean out from the result the values that were given as example ?

intval avatar Oct 30 '25 22:10 intval

I think one thing you could try is having very generic examples and making it clear in the instruction that the examples are only for illustrative purposes @intval. The library doesn't control the actual performance of the model output aside from parsing the output, so working on this this through the prompt instructions + examples is likely the most effective. Sharing a full log of the result would help others also provide more feedback!

aksg87 avatar Nov 02 '25 11:11 aksg87