langsmith-sdk icon indicating copy to clipboard operation
langsmith-sdk copied to clipboard

Assign Succes or Fail to own custom metric

Open FrancoisMasson1990 opened this issue 1 year ago • 5 comments

When using RunEvaluator it is possible to return an EvaluationResult object with a key and a score for example that are visible in the SDK platform. But how can we assign a threshold value for this specific score (Succes or Failure with green/red color) if the score is below or above that value ? Something similar to what is done for Error Rate % for example.

FrancoisMasson1990 avatar Feb 11 '24 17:02 FrancoisMasson1990

Great request - it's on the roadmap - would you want this defined in terms of relative performance (to a baseline) or absolute (e.g., 80% shipping threshold)?

hinthornw avatar Feb 12 '24 18:02 hinthornw

Both should be relevant IMO.

FrancoisMasson1990 avatar Feb 12 '24 18:02 FrancoisMasson1990

Cool makes sense. I can't promise a specific timeline but thresholds / additional metric inteprretation is something we do plan to add

hinthornw avatar Feb 14 '24 22:02 hinthornw

Hi @hinthornw, any news on that request by any chance 🙏?

FrancoisMasson1990 avatar Mar 20 '24 19:03 FrancoisMasson1990

(responded offline) - it's not currently scheduled for the next couple of weeks. We DO support summary evaluators, so you can define arbitrary conditions over the runs+examples, but still a bit before we implement experiment or example-level pass/fail conditions

hinthornw avatar Mar 27 '24 19:03 hinthornw