Robert Sachunsky

Results 735 comments of Robert Sachunsky

You can use `SetImageBytes` instead of `SetImage`. But there are some functions that have no raw alternative (e.g. `Get*Image`, `ProcessPage`). Tesseract internally uses Leptonica data structures for image data, so...

@mlorenzo-alice your code looks correct for grayscale AFAICT. The `len(np_image.shape) > 2` branch is not as convincing (because you can still have e.g. `RGBA`, `RGB`, `LA`, `L` channels), but I...

> Yes a method which accepts an image as a byte string ... > if the above logic for calling `SetImageBytes` from an array is generic for all images then...

> The `SetImageBuffer` method would look something like this: > ... > That's just a copy of `SetImage` but without the call to `_image_buffer`. Oh, now I got it. Yes,...

> `PIL.Image` objects and it seems you don't have it installed. IMHO `tesserocr` should have an `install_requires` with `PIL.Image`. I know you want to be more tolerant/flexible: https://github.com/sirfz/tesserocr/blob/1ba079f89a340187612e32258e58c0b88fa987ab/tesserocr.pyx#L26-L30 And you...

BTW, having a global `pil_installed = True` when import succeeds, as in `tests/test_api.py` would also be useful. > Or maybe a global switch to disable Pillow and purely return raw...

> > Finally, how about _also_ interfacing with `pixa` objects natively from Python via [jsbueno/pyleptonica#11](https://github.com/jsbueno/pyleptonica/pull/11)? > > I did contemplate also mapping some leptonica functionality at the beginning (Pix being...

> I check for words' confidence (word level) and then want to go to character level in separate iterator in case confidence is not sufficient. You don't have to. You...

@sirfz , > You can always try multiprocessing instead but I don't think it's gonna be any different, theoretically. I disagree. Python's global interpreter lock (GIL) should prevent any true...

> I disagree. Python's global interpreter lock (GIL) should prevent any true parallelism theoretically – so preemptive multitasking would only yield a speedup by better multiplexing blocking I/O operations. >...