outlines
outlines copied to clipboard
Measure Structured Generation Quality Via "Logprob Margin"
Presentation of the new feature
Structured generation outputs often suffer from poor quality due to suboptimal model selection, prompts, or formatting (e.g., missing chat templates). Outlines should offer a simple tool to assess the quality of a structured generation pipeline.
A useful metric could be the "logprob margin": the difference between the top token's logprob and the top legal token's logprob.
A higher logprob margin would indicate that the prompt / model is well suited for the structured generation task at hand.
Where does it fit in Outlines?
This feature would fit well in outlines.generate, or as a method in the outlines.models.OutlinesModel base class following a refactor.
Are you willing to open a PR?
Yes, I'd like to submit a PR once I have the time, as this would be a valuable addition.