Robert Sachunsky

Results 730 comments of Robert Sachunsky
trafficstars

AFAICT this processor tries to avoid textual noise via separator line detection. There are a couple of (crappy and badly documented) parameters for this (`rular...`), but IMHO your best shot...

@beckstefan is this gone with the reimplementation of the cropper? (If you could post or link to the originals, I could run it...)

I suggest we simply drop the ocrd-anybaseocr-deskew processor. It is incorrect and offers no advantage over the earlier and better ocrd-cis-ocropy-deskew.

Your fix now allows to render text that has (HTML-encoded) newlines in it as well, but no `SP` (or not even multiple distinct `TextLine` elements). See [here](http://digital.slub-dresden.de/idDE-611-BF-82014) for an example....

> although multiple versions could use the same parser To that: I completely agree, and it seems at least the ALTO parser currently [does already tolerate](https://github.com/OCR-D/core/issues/544#issuecomment-868233760) multiple namespace versions. It...

Ok, so in order to utilise multiple cores, I therefore have to wrap the solver (batched or not) in some Pythonic multiprocessing paradigm, right? Since builtin `multithreading` suffers from GIL...

> (Most images in the README do not render there, too.) should be fixed by 2nd commit

Also, I wonder whether you'd want to mention https://github.com/maxbachmann/RapidFuzz as another fast (C++ based) versatile string alignment library for Python?

Also fixes #9 and #16 Sorry, didn't see #14 and #15 before – probably equivalent.

So if I want multiprocessing in the 1:n scenario (`process.extract`), what would you recommend currently? Is using `process.cdist` with a single-item query going to be better than a custom loop...