optimum-benchmark icon indicating copy to clipboard operation
optimum-benchmark copied to clipboard

Evaluators for specific tasks

Open IlyasMoutawwakil opened this issue 1 year ago • 5 comments

@regisss would it make sense to add task specific evaluators. for example with automatic-speech-recognition, as I did it manually when I did whisper's benchmark.

IlyasMoutawwakil avatar Aug 21 '23 07:08 IlyasMoutawwakil

Sure why not! Do you have task-specific perf metrics in mind? What was the one you used for the Whisper benchmark?

regisss avatar Aug 21 '23 07:08 regisss

WER (word error rate), not very universal, but it's the current standard.

IlyasMoutawwakil avatar Aug 21 '23 08:08 IlyasMoutawwakil

Ah yes okay, I thought you were talking about some specific speed metrics. Maybe you can use evaluate for this: https://github.com/huggingface/evaluate

regisss avatar Aug 21 '23 08:08 regisss

coool there's already a list of implemented evaluators, including automatic-speech-recognition. now the question is whether to have use this as a separate benchmark called evaluation or have it as an argument in inference like memory. I think the second makes sense and avoids repeating the same load/optimization/quantization workload.

IlyasMoutawwakil avatar Aug 28 '23 04:08 IlyasMoutawwakil

I agree, the latter seems better from a UX point of view :+1:

regisss avatar Aug 28 '23 07:08 regisss