outlines Measure Structured Generation Quality Via "Logprob Margin"

Measure Structured Generation Quality Via "Logprob Margin"

Open lapp0 opened this issue 1 year ago • 0 comments

Presentation of the new feature

Structured generation outputs often suffer from poor quality due to suboptimal model selection, prompts, or formatting (e.g., missing chat templates). Outlines should offer a simple tool to assess the quality of a structured generation pipeline.

A useful metric could be the "logprob margin": the difference between the top token's logprob and the top legal token's logprob.

A higher logprob margin would indicate that the prompt / model is well suited for the structured generation task at hand.

Where does it fit in Outlines?

This feature would fit well in outlines.generate, or as a method in the outlines.models.OutlinesModel base class following a refactor.

Are you willing to open a PR?

Yes, I'd like to submit a PR once I have the time, as this would be a valuable addition.

Oct 01 '24 16:10 lapp0

outlines outlines copied to clipboard

Measure Structured Generation Quality Via "Logprob Margin"

Presentation of the new feature

Where does it fit in Outlines?

Are you willing to open a PR?

outlines
outlines copied to clipboard