llm-structured-output-benchmarks Add polyfactory framework

Add a framework that generates mock responses using polyfactory.

Related to #1.

Summary by Sourcery

This pull request adds a new framework, PolyfactoryFramework, which generates mock responses using the polyfactory library. Configuration for this framework has been added to config.yaml, and the framework is imported in frameworks/init.py.

New Features:
- Introduced PolyfactoryFramework to generate mock responses using the polyfactory library.
Enhancements:
- Updated config.yaml to include configuration for PolyfactoryFramework.
- Modified frameworks/init.py to import PolyfactoryFramework.

Jul 12 '24 09:07 adrianeboyd

Reviewer's Guide by Sourcery

This pull request introduces a new framework, PolyfactoryFramework, which generates mock responses using the polyfactory library. The changes include updates to the configuration file, the framework initialization file, and the addition of a new framework implementation file.

File-Level Changes

Files	Changes
`config.yaml` `frameworks/__init__.py` `frameworks/polyfactory_framework.py`	Introduced PolyfactoryFramework to generate mock responses, updated configuration and initialization files, and added the new framework implementation.

Tips

Trigger a new Sourcery review by commenting @sourcery-ai review on the pull request.
Continue your discussion with Sourcery by replying directly to review comments.
You can change your review settings at any time by accessing your dashboard:
- Enable or disable the Sourcery-generated pull request summary or reviewer's guide;
- Change the review language;
You can always contact us if you have any questions or feedback.

Jul 12 '24 09:07 sourcery-ai[bot]

(This sourcery thing seems noisier and less capable than a linter.)

Jul 12 '24 12:07 adrianeboyd

Hey I ran your branch locally and have some comments:

The framework will better suit the Synthetic Data Generation task instead of the Multi-label classification task. I'm planning it out and will submit some updates to your branch once that's ready.
It seems that the framework generates only synthetic labels without corresponding text. I'm not sure if polyfactory can do that. So we may need to change the response_model used to something more suitable for synthetic data generation. I'm still thinking about this as part of the Synthetic Data Generation task too.

So for now, I'll keep this PR open and revisit it again once I have Synthetic Data Generation task up and running.

Jul 13 '24 01:07 stephenleo

I kind of disagree, because I think it's more reasonable to have it available as a comparison for classification tasks that have a sensible random baseline. I think it would be much less interesting for a synthetic generation task (I guess unless the task is boring enough that you should be using faker instead).

It doesn't refer to the input text because it's just generating a random list of labels from the provided schema. (In a few tests where I limited the number of possible labels to match the sampling setup and then sampled more data, it had something like 0.1-0.3% accuracy. I would also guess that a majority baseline might be better than a random baseline, but I didn't try that.)

And this was all a bit facetious, I didn't necessarily expect it to be merged, since adding any accuracy metrics to the table would make you immediately want to eliminate it. The point was just that you can easily generate the structure and a random baseline from the response model.

Jul 13 '24 08:07 adrianeboyd

Ah got it. good point! Yep, I'm definitely on the lookout for a suitable dataset to include an accuracy metric that will immediately flag a random label generator as inaccurate. I'm currently prioritizing getting the code up first so that datasets can be easily swapped in and out.

Jul 14 '24 01:07 stephenleo

llm-structured-output-benchmarks llm-structured-output-benchmarks copied to clipboard

Add polyfactory framework

Summary by Sourcery

Reviewer's Guide by Sourcery

File-Level Changes

llm-structured-output-benchmarks
llm-structured-output-benchmarks copied to clipboard