instructor
instructor copied to clipboard
Is there a way to parse JSON in non-strict mode?
Is your feature request related to a problem? Please describe. Anthropic's models regularly have control characters in their strings, producing invalid JSON that causes validation to fail.
Describe the solution you'd like I'd like the option to parse JSON by passing strict=False as the docs over here indicate I can.
Describe alternatives you've considered
- Catching the Pydantic validation error, checking if it the type is
json_invalid
, and then parsing the JSON on my own. - I've poked around the codebase. It looks like this was possible at one point and the current code has docstrings indicating that this is possible but
strict
is not a valid param.
I can't figure out how to do this or if this functionality you intend the library to have (or once did but don't anymore).
I second
It looks like instructor uses Pydantic's model_validate_json
. It has a strict
param which doesn't have the same semantics as strict
in json.loads(...,strict=False)
It's possible to do model_validate_json(json.loads(...))
-- is that something you'd consider allowing?
It's less performant and it won't work in a streaming context, but it gives clients an out when models misbehave with control characters.
My proposal is something like:
- Clients can pass in
control_characters_allowed=True
or something toinstructor.from_{client}
- If that param is True:
- Streaming responses are not allowed.
- JSON is parsed using
json
from the stdlib usingstrict=False
before Pydantic validation.
Some other ideas:
- Only do this as a falllback when the validation error is because JSON is invalid due to control characters in strings.
- Give clients the option to fail the first time there's an error due to control characters (vs. going through all the retries) so they can recover with: https://github.com/jxnl/instructor/commit/339c22ec58abec1d425fe1d0556406c66721a5f5.
Or maybe your thought is to let clients handle this particular error. I'm doing that now, but I think it would be nice to bake handling this into instructor instead.
Links:
- https://docs.pydantic.dev/2.7/concepts/json/
- https://docs.pydantic.dev/latest/concepts/performance/#in-general-use-model_validate_json-not-model_validatejsonloads
I'd actually rather expose strict=False
to the create call's patch, let me try to spend 10 minutes on this, if you dont hear back i'd take a PR too!
https://github.com/jxnl/instructor/pull/618 please take a look
Hey! Thanks for your help on this.
Continuing the discussion here since #618 is merged.
I'm happy to provide a patch for this functionality.
What do you think of merging Pydantic and json.load
's "strict" semantics as in this example here: https://github.com/jxnl/instructor/pull/618#issuecomment-2069506854
Or would you prefer to split these up?
i think it makes sense esp since control characters are an issue
Great -- will merge those semantics. I'll take a crack at a patch.
Fixed in #644.