tonic_validate icon indicating copy to clipboard operation
tonic_validate copied to clipboard

Add multiple runs per question and report average/stdev

Open ghost opened this issue 1 year ago • 0 comments

Hello, please add the ability to have a fixed number of runs per question instead of 1 and report average and stdev of all metrics (perhaps min/max or some sort of a histogram as well). That would allow avoiding outliers in the testing process like network connection issues, LLM temperature effect etc.

ghost avatar Apr 29 '24 15:04 ghost