optimum-benchmark
optimum-benchmark copied to clipboard
Evaluators for specific tasks
@regisss would it make sense to add task specific evaluators. for example with automatic-speech-recognition
, as I did it manually when I did whisper's benchmark.
Sure why not! Do you have task-specific perf metrics in mind? What was the one you used for the Whisper benchmark?
WER (word error rate), not very universal, but it's the current standard.
Ah yes okay, I thought you were talking about some specific speed metrics.
Maybe you can use evaluate
for this: https://github.com/huggingface/evaluate
coool there's already a list of implemented evaluators, including automatic-speech-recognition
.
now the question is whether to have use this as a separate benchmark called evaluation
or have it as an argument in inference like memory
. I think the second makes sense and avoids repeating the same load/optimization/quantization workload.
I agree, the latter seems better from a UX point of view :+1: