jiwer icon indicating copy to clipboard operation
jiwer copied to clipboard

Is it possible just to get the number of errors?

Open SteveBraich opened this issue 1 year ago • 3 comments

Is it possible just to get the number of errors?

I know I could probably just get the wer and multiply by the number of words to get the number errors, but I was hoping that was unnecessary.

Edit: The reason I am asking is this... I want to rollup all of my sentence WER into an overall document WER.

SteveBraich avatar Feb 13 '24 06:02 SteveBraich

If you use jiwer.process_words you get a WordOutput object. The number of errors would be the sum of substitutions, insertions, and deletions available in this object.

nikvaessen avatar Feb 13 '24 07:02 nikvaessen

If you want an overall WER of a document, you can use wer_contiguous transform instead. This allows the number of references and hypothesis sentences to differ.

For example:

import jiwer
jiwer.process_words(
    reference,
    hypothesis,
    reference_transform=jiwer.wer_contiguous,
    hypothesis_transform=jiwer.wer_contiguous,
)

nikvaessen avatar Feb 13 '24 07:02 nikvaessen

I asked chatGPT, and here is the code it generated:

import jiwer

# Assuming 'original_sentences' and 'corrected_sentences' are your lists of sentences
original_sentences = [...]  # Your list of original sentences
corrected_sentences = [...]  # Your list of corrected sentences

# Calculate total WER for the document
total_wer = jiwer.wer(original_sentences, corrected_sentences)

print(f"Total WER for the document: {total_wer}")

Doesn't that seem easier?

EDIT: ~~That's giving me a number greater than one.~~ Actually I was using the wrong columns to calculate this. It looks good now. Thanks!

SteveBraich avatar Feb 13 '24 07:02 SteveBraich