langsmith-sdk
langsmith-sdk copied to clipboard
Assign Succes or Fail to own custom metric
When using RunEvaluator it is possible to return an EvaluationResult object with a key and a score for example that are visible in the SDK platform. But how can we assign a threshold value for this specific score (Succes or Failure with green/red color) if the score is below or above that value ? Something similar to what is done for Error Rate % for example.
Great request - it's on the roadmap - would you want this defined in terms of relative performance (to a baseline) or absolute (e.g., 80% shipping threshold)?
Both should be relevant IMO.
Cool makes sense. I can't promise a specific timeline but thresholds / additional metric inteprretation is something we do plan to add
Hi @hinthornw, any news on that request by any chance 🙏?
(responded offline) - it's not currently scheduled for the next couple of weeks. We DO support summary evaluators, so you can define arbitrary conditions over the runs+examples, but still a bit before we implement experiment or example-level pass/fail conditions