tonic_validate
tonic_validate copied to clipboard
Add multiple runs per question and report average/stdev
Hello, please add the ability to have a fixed number of runs per question instead of 1 and report average and stdev of all metrics (perhaps min/max or some sort of a histogram as well). That would allow avoiding outliers in the testing process like network connection issues, LLM temperature effect etc.