Robert Sachunsky comments

Results 735 comments of


                                            Robert Sachunsky

setup: add repo URL

> Not looked into details, but https://github.com/maxbachmann/RapidFuzz looks more like a string distance computation without any alignments. It implement its own fast Needleman-Wunsch alignment (based on Hyyrö algorithm or Wagner-Fischer)...

Feature: Convert edit distance to ratio/similarity

> the normalized Levenshtein distance which is calulated as: > > ``` > 1 - lev_dist / max(len1, len2) > ``` [...] > However in editdistance it should be simple...

Feature: Convert edit distance to ratio/similarity

> I am a complete noob on these OCR topics. I simply needed a fast implementation and found none, so I did build my own ;) Looks promising, will have...

Return edit operations too

IIUC returning the total length of the alignment path (i.e. insertions, deletions, substitutions, identities) is also necessary to calculate a correct (unbiased) accuracy / error rate. (Using the length of...

training fail again and again

> The current version of tesstrain requires users to run `make tesseract-langdata` before running the training. Older versions of tesstrain did not require this additional step which explains that there...

Build fails for MacOS (ocrd-fork-pycocotools)

No idea. Note that `ocrd-fork-pycocotools` is https://github.com/bertsky/cocoapi – I only added a few fixes. Meanwhile, I have merged from upstream. Please try again now (I have made a release 2.0.6.post1)....

Build fails for MacOS (ocrd-fork-pycocotools)

I don't know what `in-tree-build` is. So you are saying that MacOS works if you install manually? Or that you can pip install from the new src tarball on PyPI?

Build fails for MacOS (ocrd-fork-pycocotools)

Ah, got it. Hard to tell from here. But could you try `python setup.py build_ext install` (which is in upstream's upstream)?

Build fails for MacOS (ocrd-fork-pycocotools)

Another [thing](https://github.com/cocodataset/cocoapi/issues/473) you could try: setting `ARCHFLAGS="-arch x86_64"` during compilation.

frak models in ocrd resmgr

That's strange indeed. It's not to be expected from the vanilla tesstrain rules (even the fast variant just does ConvertToInt). And the concrete wordlist looks very awkward (contains 400k fullforms,...