How are derived metrics defined
Hi I want to create a derived metric using a mathematical expression but its not clear to me how to average the scores between tests. What language is used for the mathematical expressions and can things like the number of tests be used e.g. sum(tests) / len(tests) ?
Promptfoo uses mathjs to evaluate expressions in derived metrics. You can use standard math operators for operations like averaging. For details, please see the mathjs syntax documentation.
For example, to calculate an average score you might write:
derivedMetrics:
- name: 'AverageScore'
value: '(metric1 + metric2 + metric3) / 3'
I just opened https://github.com/promptfoo/promptfoo/pull/3157 to try to explain them better! Would love your thoughts on it. Let us know if you have any other questions.
Thanks, yes that explains it much better! Two related questions:
- It doesn't seem possible to add an assertion to derived metrics so that they can pass/fail, is that correct?
- I would like to create a derived metric that calculates the average latency across multiple input vars. However it seems like the derived metric can only access the pass/fail value rather than the latency milliseconds
Also the javascript example was great and I think it might be able to be used for the average latency use case, but it would help if we could say how all the metrics such as accuracy, speed and difficulty were defined in the YAML
@mldangelo tagging in case you didn't get a notification for this. Apologies if you did
Hi @nfrancis-esure, thanks for the question!
Derived metrics use mathjs syntax for mathematical expressions. The documentation has been expanded since this issue was opened and now covers this in detail:
Quick Answer
For your specific question about averaging - derived metrics operate per prompt, not across all tests. Each named metric accumulates pass rates automatically. For example:
defaultTest:
assert:
- type: contains
value: "expected"
metric: accuracy # This creates a named metric
derivedMetrics:
- name: weighted_score
value: 'accuracy * 0.6 + relevance * 0.4' # mathjs syntax
The accuracy metric will automatically show as a percentage (e.g., "85.00% (17/20)") in the results.
For more complex logic, you can use JavaScript functions:
derivedMetrics:
- name: custom_avg
value: |
function(namedScores, evalStep) {
const { metric1 = 0, metric2 = 0 } = namedScores;
return (metric1 + metric2) / 2;
}
See also:
Closing as this is now documented. Feel free to reopen if you have follow-up questions!