instructor icon indicating copy to clipboard operation
instructor copied to clipboard

feat: update and allow strict mode

Open jxnl opened this issue 10 months ago • 3 comments

addresses #612


:rocket: This description was created by Ellipsis for commit 291e3e59f937417d5dae99b3786850a40d091e41

Summary:

The pull request introduces a new strict parameter, defaulting to True, to several methods in the Instructor and AsyncInstructor classes and to the new_create_async and new_create_sync functions.

Key points:

  • Added a strict parameter to several methods in the Instructor and AsyncInstructor classes in instructor/client.py.
  • Added a strict parameter to the new_create_async and new_create_sync functions in instructor/patch.py.
  • The strict parameter is a boolean that defaults to True.

Generated with :heart: by ellipsis.dev

jxnl avatar Apr 21 '24 19:04 jxnl

Deploying instructor with  Cloudflare Pages  Cloudflare Pages

Latest commit: 291e3e5
Status: ✅  Deploy successful!
Preview URL: https://6128e958.instructor.pages.dev
Branch Preview URL: https://allow-strict-in-create.instructor.pages.dev

View logs

I think this is a good change and should be merged but it doesn't fulfill my intent behind #612.

#612 is about allowing control characters in JSON strings because this happens so commonly with Claude's models.

Pydantic's model_validate_json(..., strict=False) does not allow control characters in strings, but does all this which might be desirable to clients in some cases.

The standard library's json.loads(... strict=False) does one thing: it allows control characters in JSON strings, which is what I want in #612.

If you want to merge these non-strict semantics, the change looks like this for the JSON-parsing functions in function_calls.py:

    @classmethod
    def parse_anthropic_json(
        cls: Type[BaseModel],
        completion,
        validation_context: Optional[Dict[str, Any]] = None,
        strict: Optional[bool] = None,
    ) -> BaseModel:
        from anthropic.types import Message

        assert isinstance(completion, Message)

        text = completion.content[0].text
        extra_text = extract_json_from_codeblock(text)

        if strict:
            return cls.model_validate_json(
                extra_text, context=validation_context, strict=strict
            )
        else:
            # Allow control characters.
            parsed = json.loads(extra_text, strict=False)
            # Pydantic non-strict: https://docs.pydantic.dev/latest/concepts/strict_mode/
            return cls.model_validate(parsed, context=validation_context, strict=strict)

Maybe you don't want to merge these semantics in instructor's strict, in which case there would need to be two separate arguments to toggle these different capabilities.

If this is functionality you want in instructor I'm happy to submit a PR subject to however you want to design this.

voberoi avatar Apr 22 '24 13:04 voberoi

This functionality was made possible at some point, not sure when it was removed: https://github.com/jxnl/instructor/pull/75

voberoi avatar Apr 22 '24 15:04 voberoi

I think this is a good change and should be merged but it doesn't fulfill my intent behind #612.

#612 is about allowing control characters in JSON strings because this happens so commonly with Claude's models.

Pydantic's model_validate_json(..., strict=False) does not allow control characters in strings, but does all this which might be desirable to clients in some cases.

The standard library's json.loads(... strict=False) does one thing: it allows control characters in JSON strings, which is what I want in #612.

If you want to merge these non-strict semantics, the change looks like this for the JSON-parsing functions in function_calls.py:

    @classmethod
    def parse_anthropic_json(
        cls: Type[BaseModel],
        completion,
        validation_context: Optional[Dict[str, Any]] = None,
        strict: Optional[bool] = None,
    ) -> BaseModel:
        from anthropic.types import Message

        assert isinstance(completion, Message)

        text = completion.content[0].text
        extra_text = extract_json_from_codeblock(text)

        if strict:
            return cls.model_validate_json(
                extra_text, context=validation_context, strict=strict
            )
        else:
            # Allow control characters.
            parsed = json.loads(extra_text, strict=False)
            # Pydantic non-strict: https://docs.pydantic.dev/latest/concepts/strict_mode/
            return cls.model_validate(parsed, context=validation_context, strict=strict)

Maybe you don't want to merge these semantics in instructor's strict, in which case there would need to be two separate arguments to toggle these different capabilities.

If this is functionality you want in instructor I'm happy to submit a PR subject to however you want to design this.

lets allow this too, I'll merge this first. sorry for delay was on vacation!

jxnl avatar Apr 28 '24 00:04 jxnl