llm-structured-output-benchmarks icon indicating copy to clipboard operation
llm-structured-output-benchmarks copied to clipboard

Add Formatron framework

Open adrianeboyd opened this issue 1 year ago • 2 comments

Summary by Sourcery

Add the FormatronFramework to the project, enabling new tasks like multilabel classification and synthetic data generation with specific model configurations. Update the configuration file to include settings for the new framework.

New Features:

  • Introduce the FormatronFramework to support tasks such as multilabel classification and synthetic data generation using the 'unsloth/llama-3-8b-Instruct-bnb-4bit' model.

Enhancements:

  • Add configuration for the FormatronFramework in the config.yaml file, specifying tasks, model details, and parameters.

adrianeboyd avatar Sep 21 '24 13:09 adrianeboyd

Reviewer's Guide by Sourcery

This pull request introduces the Formatron framework, a new machine learning framework for various NLP tasks. The changes include adding configuration for the Formatron framework in the config.yaml file and implementing the FormatronFramework class in a new file.

File-Level Changes

Change Details Files
Added configuration for the Formatron framework
  • Configured tasks for multilabel classification and synthetic data generation
  • Set up parameters such as n_runs, prompt, LLM model, and other initialization arguments
  • Commented out configuration for NER task
config.yaml
Implemented FormatronFramework class
  • Created a new class that inherits from BaseFramework
  • Implemented initialization method with model loading and configuration
  • Added support for different tasks (multilabel classification and others)
  • Implemented run method for executing experiments
  • Integrated with Formatron library for formatting and processing
frameworks/formatron_framework.py

Tips
  • Trigger a new Sourcery review by commenting @sourcery-ai review on the pull request.
  • Continue your discussion with Sourcery by replying directly to review comments.
  • You can change your review settings at any time by accessing your dashboard:
    • Enable or disable the Sourcery-generated pull request summary or reviewer's guide;
    • Change the review language;
  • You can always contact us if you have any questions or feedback.

sourcery-ai[bot] avatar Sep 21 '24 13:09 sourcery-ai[bot]

Some example results (1 run instead of 10, on an RTX A5000):

  • multilabel classification
           Reliability
Outlines          1.00
Formatron         0.99

           Latency_p95(s)
Outlines            1.804
Formatron          13.710
  • ner required fields
                  Reliability
Outlines                 1.00
Formatron                0.99
LMFormatEnforcer         0.98

                  Latency_p95(s)
Formatron                 16.950
Outlines                  31.033
LMFormatEnforcer          45.598

          framework  micro_precision  micro_recall  micro_f1
0          Outlines         0.656250      0.546243  0.596215
1         Formatron         0.762590      0.614493  0.680578
2  LMFormatEnforcer         0.648464      0.562130  0.602219
  • synthetic data generation
           Reliability
Formatron          1.0

           Latency_p95(s)
Formatron           4.761

           Variety
Formatron      0.6

adrianeboyd avatar Sep 21 '24 13:09 adrianeboyd