llm-structured-output-benchmarks
llm-structured-output-benchmarks copied to clipboard
Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on tasks like multi-label classification, named entity recognition, sy...
Hi there, interesting benchmark. Any chance to add Pydantic-ai ? I would be curious to see how well it performs compared to others
Add a framework that generates mock responses using `polyfactory`. Related to #1. ## Summary by Sourcery This pull request adds a new framework, PolyfactoryFramework, which generates mock responses using the...
## Summary by Sourcery Add the FormatronFramework to the project, enabling new tasks like multilabel classification and synthetic data generation with specific model configurations. Update the configuration file to include...
In order to have an NER model that is simpler for internal regex/CFG representations, add an NER variant that requires all fields and does not include a default value. In...
Hi, it's nice to come across a cross-library/model benchmark like this! When looking at evaluations for structured output libraries, I feel like "valid response" is such a low bar when...