tesstrain
tesstrain copied to clipboard
Feature Request: Character Frequency in Training Text
Looks a bit like output of https://github.com/eddieantonio/ocreval or https://github.com/impactcentre/ocrevalUAtion but PosixPath implies it's a Python tool producing this.
That report was generated by @JKamlah.
Thanks. I have seen such reports as part of accuracy output from ocreval. Kraken generates them for both the training and testing sets. I think it will be useful to add it as part of tesstrain.
I found a python script which generates similar info. It is from https://github.com/cmroughan/kraken_generated-data in the tools directory. https://github.com/wincentbalin/pytesstrain also has some useful tools which generate a wordlist as well as unigram and bigram frequencies.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.