Robert Sachunsky comments

Results 735 comments of


                                            Robert Sachunsky

Using PSM.AUTO_OSD or default doesn't make any difference

For the record: Tesseract itself is a little weak in documenting this properly. (It happened when transitioning from version 3 to LSTM-based 4.) 1. `OSD` (as in `DetectOrientationScript()` or `DetectOS()`)...

Question on handwriting OCR

Hi @Archilegt, sure, if you have suitable ground truth (i.e. training data, pairs of image and text for individual lines), you can do HTR with Tesseract, too. Modern OCR engines...

Question on handwriting OCR

We are doing something very similar currently – see [here](https://wrznr.github.io/dhh-text-2021) for details (in German). Basically, if you want to follow above OCR-D based workflow (or variants of it with different...

how to limit character set in image_to_text?

For legacy models, the effect is there. For LSTM models, these kind of settings are not constraints but just hints. See issues/documentation in Tesseract itself.

Tesseract prints characters differ from lstmeval

> By the way, are there any embedded debug support for the `tesseract` app which can be activated? yes, you can: [build with debugging enabled](https://tesseract-ocr.github.io/tessdoc/Compiling-%E2%80%93-GitInstallation.html#debug-builds) and then enable any of...

Tesseract prints characters differ from lstmeval

> Why the characters recognized by `lstmeval` and `tesseract` are different? Is it normal? Yes, it's not unlikely, since the latter is much more complex – e.g. because it contains...

Tesseract prints characters differ from lstmeval

> Is this really a tesstrain issue? You are right, this should probably be transferred to the tesseract repo.

Tesseract prints characters differ from lstmeval

@jhartungBE all we have at this point are suspicions (what to look for). Have you tried … - `PSM=13` / `--psm 13` - with traineddata from `tessdata_best` / without `--convert_to_int`...

Tesseract prints characters differ from lstmeval

@jhartungBE, like I said in my [first comment](https://github.com/tesseract-ocr/tesstrain/issues/110#issuecomment-856294912), the Tesseract standalone CLI has much more than just the bare recognition of lstmeval – and that includes a check and compensation...

Tesseract prints characters differ from lstmeval

Yes, that's what it means. Just install ImageMagick and do a `convert input.png -negate output.png`