tesstrain icon indicating copy to clipboard operation
tesstrain copied to clipboard

Feature Request: Character Frequency in Training Text

Open Shreeshrii opened this issue 4 years ago • 4 comments

@stweil How to get a report like Analyse-Report Version 0.1 shown in NZZ wiki page?

Shreeshrii avatar Dec 20 '20 03:12 Shreeshrii

Looks a bit like output of https://github.com/eddieantonio/ocreval or https://github.com/impactcentre/ocrevalUAtion but PosixPath implies it's a Python tool producing this.

kba avatar Dec 21 '20 10:12 kba

That report was generated by @JKamlah.

stweil avatar Dec 21 '20 10:12 stweil

Thanks. I have seen such reports as part of accuracy output from ocreval. Kraken generates them for both the training and testing sets. I think it will be useful to add it as part of tesstrain.

I found a python script which generates similar info. It is from https://github.com/cmroughan/kraken_generated-data in the tools directory. https://github.com/wincentbalin/pytesstrain also has some useful tools which generate a wordlist as well as unigram and bigram frequencies.

Shreeshrii avatar Dec 23 '20 12:12 Shreeshrii

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jan 23 '21 05:01 stale[bot]