Robert Sachunsky
Robert Sachunsky
> Should I move this issue to the `tesseract` repository? Please do!
@wollmers > For avoidance of doubt: CER is usually computed based on Levenshtein distance in the narrow sense, which means with allowed edit operations of insert, delete and substitute each...
No one seems to care that their trained models are selected from suboptimal checkpoints and reported with way-too-optimistic error rates? As long as lstmtraining and lstmeval report figures the way...
@stweil > the current models are not so bad, they could be even better if an appropriate checkpoint was selected – which a huge waste > it is already possible...
> [Levenshtein language:c++ stars:>50 -license:gpl](https://github.com/search?q=Levenshtein+language%3Ac%2B%2B+stars%3A%3E50+-license%3Agpl) > > Edit: Indeed, RapidFuzz seems to be the best option. Not tagged with that topic, but https://github.com/seqan/seqan3 could be an option, too. @wollmers, agreeing...
> So basically we can split this into two separate tasks: > > 1. Make Tesseract use the eval set to select checkpoints and to determine when to finish training....
@stweil, what you describe are external means, though. But the question raised by @amitdo was whether there might be some script solution to address the CER/BCER calculation problem from _within_...
Note: you can set the color of the arrows according to their respective labels ex post by: - keeping track of the artists added to `ax.texts` after `adjust_text`, and then...
@edlemus what do you mean `adjust_text` does not have an attribute? It's just a function on `matplotlib.text.Text` instances. See https://github.com/i008/COCO-dataset-explorer/blob/df82a833163d7acc131eefb5264684cb2cc627b5/vis.py#L116-L119 for example.
That cannot work. Tesseract's image datastructure `pix` (from Leptonica) needs to _know_ what format the input is in. Either it's a full byte stream of some standard image format (recognizable...