llm-structured-output-benchmarks
llm-structured-output-benchmarks copied to clipboard
Add polyfactory framework
Add a framework that generates mock responses using polyfactory.
Related to #1.
Summary by Sourcery
This pull request adds a new framework, PolyfactoryFramework, which generates mock responses using the polyfactory library. Configuration for this framework has been added to config.yaml, and the framework is imported in frameworks/init.py.
- New Features:
- Introduced PolyfactoryFramework to generate mock responses using the polyfactory library.
- Enhancements:
- Updated config.yaml to include configuration for PolyfactoryFramework.
- Modified frameworks/init.py to import PolyfactoryFramework.
Reviewer's Guide by Sourcery
This pull request introduces a new framework, PolyfactoryFramework, which generates mock responses using the polyfactory library. The changes include updates to the configuration file, the framework initialization file, and the addition of a new framework implementation file.
File-Level Changes
| Files | Changes |
|---|---|
config.yamlframeworks/__init__.pyframeworks/polyfactory_framework.py |
Introduced PolyfactoryFramework to generate mock responses, updated configuration and initialization files, and added the new framework implementation. |
Tips
- Trigger a new Sourcery review by commenting
@sourcery-ai reviewon the pull request. - Continue your discussion with Sourcery by replying directly to review comments.
- You can change your review settings at any time by accessing your dashboard:
- Enable or disable the Sourcery-generated pull request summary or reviewer's guide;
- Change the review language;
- You can always contact us if you have any questions or feedback.
(This sourcery thing seems noisier and less capable than a linter.)
Hey I ran your branch locally and have some comments:
- The framework will better suit the
Synthetic Data Generationtask instead of theMulti-label classificationtask. I'm planning it out and will submit some updates to your branch once that's ready. - It seems that the framework generates only synthetic
labelswithout correspondingtext. I'm not sure if polyfactory can do that. So we may need to change the response_model used to something more suitable for synthetic data generation. I'm still thinking about this as part of theSynthetic Data Generationtask too.
So for now, I'll keep this PR open and revisit it again once I have Synthetic Data Generation task up and running.
I kind of disagree, because I think it's more reasonable to have it available as a comparison for classification tasks that have a sensible random baseline. I think it would be much less interesting for a synthetic generation task (I guess unless the task is boring enough that you should be using faker instead).
It doesn't refer to the input text because it's just generating a random list of labels from the provided schema. (In a few tests where I limited the number of possible labels to match the sampling setup and then sampled more data, it had something like 0.1-0.3% accuracy. I would also guess that a majority baseline might be better than a random baseline, but I didn't try that.)
And this was all a bit facetious, I didn't necessarily expect it to be merged, since adding any accuracy metrics to the table would make you immediately want to eliminate it. The point was just that you can easily generate the structure and a random baseline from the response model.
Ah got it. good point! Yep, I'm definitely on the lookout for a suitable dataset to include an accuracy metric that will immediately flag a random label generator as inaccurate. I'm currently prioritizing getting the code up first so that datasets can be easily swapped in and out.