llm-structured-output-benchmarks icon indicating copy to clipboard operation
llm-structured-output-benchmarks copied to clipboard

Add NER model variant with required fields

Open adrianeboyd opened this issue 1 year ago • 2 comments

In order to have an NER model that is simpler for internal regex/CFG representations, add an NER variant that requires all fields and does not include a default value.

In particular, this makes it possible to evaluate a version of NER for Outlines and provides an additional point of comparison for other libraries.

Summary by Sourcery

Add a new NER model variant 'ner_required_fields' to support simpler internal representations by requiring all fields without default values. Update configuration and experiment framework to accommodate this new task.

New Features:

  • Introduce a new NER model variant called 'ner_required_fields' that requires all fields and does not include default values, facilitating simpler internal regex/CFG representations.

Enhancements:

  • Update the configuration to include the 'ner_required_fields' task with specific parameters for model initialization and execution.
  • Modify the experiment framework to support the 'ner_required_fields' task, allowing it to be executed alongside existing tasks.

adrianeboyd avatar Sep 21 '24 13:09 adrianeboyd

Reviewer's Guide by Sourcery

This pull request introduces a new NER (Named Entity Recognition) model variant with required fields. The changes primarily affect the configuration, base framework, data models, and main execution file. The new variant aims to simplify internal regex/CFG representations and provide an additional point of comparison for other libraries.

File-Level Changes

Change Details Files
Added a new NER task variant with required fields
  • Introduced 'ner_required_fields' task in the configuration
  • Added initialization parameters for the new task
  • Created a new data model function 'ner_required_fields_model'
config.yaml
frameworks/base.py
data_sources/data_models.py
Updated task lists and conditional logic to include the new NER variant
  • Added 'ner_required_fields' to the list of allowed tasks
  • Modified conditional statements to handle the new task type
frameworks/base.py
main.py
Adjusted metric calculation to support the new NER variant
  • Updated the condition for NER metric calculation to include 'ner_required_fields'
  • Modified the NER micro metrics calculation to support the new task
frameworks/base.py
main.py

Tips
  • Trigger a new Sourcery review by commenting @sourcery-ai review on the pull request.
  • Continue your discussion with Sourcery by replying directly to review comments.
  • You can change your review settings at any time by accessing your dashboard:
    • Enable or disable the Sourcery-generated pull request summary or reviewer's guide;
    • Change the review language;
  • You can always contact us if you have any questions or feedback.

sourcery-ai[bot] avatar Sep 21 '24 13:09 sourcery-ai[bot]

I have no attachment to the name NERRequiredFields, you can treat it as a placeholder.

Some example results (just 1 run instead of 10, on an RTX A5000, I'm not sure what's going on with the LMFormatEnforcer latency):

                  Reliability
Outlines                 1.00
Formatron                0.99
LMFormatEnforcer         0.98

                  Latency_p95(s)
Formatron                 16.950
Outlines                  31.033
LMFormatEnforcer          45.598

          framework  micro_precision  micro_recall  micro_f1
0          Outlines         0.656250      0.546243  0.596215
1         Formatron         0.762590      0.614493  0.680578
2  LMFormatEnforcer         0.648464      0.562130  0.602219

The LMFormatEnforcer F1 is a lot higher than for the current NER (and my initial expectation was that required fields version would lower the performance).

adrianeboyd avatar Sep 21 '24 13:09 adrianeboyd