evaluate
evaluate copied to clipboard
TIMIT typically reports PER, not WER
The docs here mention that TIMIT reports WER, but this dataset typically serves as a benchmark for phone error rate (PER), because it’s one of the few resources that have manually annotated phone segments. I recommend to fix and clarify that in the README:
https://github.com/huggingface/evaluate/blob/c1141b02941dc508ca4560b9dfe5b7a90f4cf785/metrics/wer/README.md?plain=1#L68
I think it would be good to have a clear difference between word/character/phone/token error rate (WER/CER/PER/TER) at the library level.