jiwer
jiwer copied to clipboard
Is it possible just to get the number of errors?
Is it possible just to get the number of errors?
I know I could probably just get the wer and multiply by the number of words to get the number errors, but I was hoping that was unnecessary.
Edit: The reason I am asking is this... I want to rollup all of my sentence WER into an overall document WER.
If you use jiwer.process_words
you get a WordOutput
object. The number of errors would be the sum of substitutions, insertions, and deletions available in this object.
If you want an overall WER of a document, you can use wer_contiguous
transform instead. This allows the number of references and hypothesis sentences to differ.
For example:
import jiwer
jiwer.process_words(
reference,
hypothesis,
reference_transform=jiwer.wer_contiguous,
hypothesis_transform=jiwer.wer_contiguous,
)
I asked chatGPT, and here is the code it generated:
import jiwer
# Assuming 'original_sentences' and 'corrected_sentences' are your lists of sentences
original_sentences = [...] # Your list of original sentences
corrected_sentences = [...] # Your list of corrected sentences
# Calculate total WER for the document
total_wer = jiwer.wer(original_sentences, corrected_sentences)
print(f"Total WER for the document: {total_wer}")
Doesn't that seem easier?
EDIT: ~~That's giving me a number greater than one.~~ Actually I was using the wrong columns to calculate this. It looks good now. Thanks!