langextract
langextract copied to clipboard
Schema constraints for OpenAI not supported
Description
Currently, LangExtract supports schema-constrained extraction with the use_schema_constraints option, but this feature is not available when using OpenAI models (via the openai backend) unless fence_output=True. However, OpenAI now provides official support for JSON schema, allowing structured output without relying on output fencing. Support for YAML formatting is not available via OpenAI's JSON schema mechanism.
Feature Request
Enable the use of use_schema_constraints=True with fence_output=False when leveraging OpenAI models, but restrict this functionality to cases where FormatType.JSON is used. If users select FormatType.YAML, the schema constraint option should remain unavailable or raise a clear exception.
Acceptance Criteria
- When using
language_model_typeforOpenAILanguageModel(e.g., "gpt-4o" or "o4-mini"), andFormatType.JSON, allow use_schema_constraints=Truewithfence_output=False`. - When
FormatType.YAMLis specified with OpenAI models, do not enable schema constraints, and provide an explicit error message or fallback. - Update documentation, tests and error messaging to clarify this limitation and the newly supported use case for OpenAI.
References/Related Work
- Existing Gemini output schema enforcement implementation.
- Possibility to leverage libraries like
pydanticfor runtime validation.