kraken icon indicating copy to clipboard operation
kraken copied to clipboard

Word error rate added in your release 4.3.10

Open soniasol opened this issue 6 months ago • 2 comments

Hello,

In your release 4.3.10, you mention that 'Word' error rate has been added as a validation metric in recognition training. Is it possible to get the WER score in the test report?

https://github.com/mittagessen/kraken/releases/tag/4.3.10

Thank you and have a nice day, Sonia

soniasol avatar Dec 21 '23 13:12 soniasol

Ah sorry, it is only calculated for the validation during training. I can add it to the test report as well but the method is rather simplistic as it just considers anything separated by white space as a separate word.

mittagessen avatar Dec 27 '23 23:12 mittagessen

@mittagessen thank you so much for the swift reply!

I see your point, but I think it could be useful to have both CER and WER, in addition to the accuracy score.

For instance, in my case (but I am sure this applies to many people using Kraken!), we are going to use the OCR outputs as input to tokenization, lemmatization, normalization (e.g., old French to contemporary French) and so on. Therefore, having a metric to measure the errors in terms of words would be very helpful!

Do you think you could add the WER to the test report?

Thank you very much again for your work on Kraken 🙃 Have a nice day, Sonia.

soniasol avatar Jan 02 '24 10:01 soniasol

It's implemented now on a global, i.e. not per-script, level. I'll add it to the next minor release 5.2.2.

mittagessen avatar Apr 22 '24 15:04 mittagessen