promptfoo icon indicating copy to clipboard operation
promptfoo copied to clipboard

How are derived metrics defined

Open nfrancis-esure opened this issue 10 months ago • 4 comments

Hi I want to create a derived metric using a mathematical expression but its not clear to me how to average the scores between tests. What language is used for the mathematical expressions and can things like the number of tests be used e.g. sum(tests) / len(tests) ?

nfrancis-esure avatar Feb 19 '25 12:02 nfrancis-esure

Promptfoo uses mathjs to evaluate expressions in derived metrics. You can use standard math operators for operations like averaging. For details, please see the mathjs syntax documentation.

For example, to calculate an average score you might write:

derivedMetrics:
  - name: 'AverageScore'
    value: '(metric1 + metric2 + metric3) / 3'

I just opened https://github.com/promptfoo/promptfoo/pull/3157 to try to explain them better! Would love your thoughts on it. Let us know if you have any other questions.

mldangelo avatar Feb 20 '25 05:02 mldangelo

Thanks, yes that explains it much better! Two related questions:

  • It doesn't seem possible to add an assertion to derived metrics so that they can pass/fail, is that correct?
  • I would like to create a derived metric that calculates the average latency across multiple input vars. However it seems like the derived metric can only access the pass/fail value rather than the latency milliseconds

nfrancis-esure avatar Feb 20 '25 09:02 nfrancis-esure

Also the javascript example was great and I think it might be able to be used for the average latency use case, but it would help if we could say how all the metrics such as accuracy, speed and difficulty were defined in the YAML

nfrancis-esure avatar Feb 20 '25 14:02 nfrancis-esure

@mldangelo tagging in case you didn't get a notification for this. Apologies if you did

nfrancis-esure avatar Feb 21 '25 09:02 nfrancis-esure

Hi @nfrancis-esure, thanks for the question!

Derived metrics use mathjs syntax for mathematical expressions. The documentation has been expanded since this issue was opened and now covers this in detail:

Derived Metrics Documentation

Quick Answer

For your specific question about averaging - derived metrics operate per prompt, not across all tests. Each named metric accumulates pass rates automatically. For example:

defaultTest:
  assert:
    - type: contains
      value: "expected"
      metric: accuracy  # This creates a named metric

derivedMetrics:
  - name: weighted_score
    value: 'accuracy * 0.6 + relevance * 0.4'  # mathjs syntax

The accuracy metric will automatically show as a percentage (e.g., "85.00% (17/20)") in the results.

For more complex logic, you can use JavaScript functions:

derivedMetrics:
  - name: custom_avg
    value: |
      function(namedScores, evalStep) {
        const { metric1 = 0, metric2 = 0 } = namedScores;
        return (metric1 + metric2) / 2;
      }

See also:

Closing as this is now documented. Feel free to reopen if you have follow-up questions!

mldangelo avatar Dec 07 '25 23:12 mldangelo