Stefan Weil

Results 1215 comments of Stefan Weil

I created a separate pull request #6242 for the missing override annotations. Hopefully this makes it easier to get some CodeQL issues fixed.

This pull request is an alternative for #5978.

With this PR further simplifications are possible because several scripts exist more than once in the repository, some like `script_deleteSymLink.sh` even have three incarnations. All those duplicates could be removed.

As far as I see only the 2nd points at a real problem.

It's conditionally, and the condition is never true if the difference is 0 (unless there is a negative `line_size` which should never happen).

The division is only executed if `last_x - left_x > block->line_size * 2`. A division by zero would occur if `last_x - left_x == 0`, that means only if `0...

@lackner-codefy, did you also test with models from [`tessdata`](https://github.com/tesseract-ocr/tessdata)? Do they produce similar results as the "best" models? And can you say more about your specific use case? For some...

> probably using 32-bit floating point values Tesseract 4 used `double` precision (64 bit) values. The current "best" models therefore still provide 64 bit values which are converted to `float`...

`tesseract page.png page` does not find any text, but `tesseract page.png page -c thresholding_method=1` does after about 12 seconds. I assume that page segmentation mode 11 simply takes a lot...

The default is `--thresholding_method=0` (Tesseract's own implementation of Otsu). And your original command finishs after 23:27 minutes on my MacBook, so Tesseract does not hang, but simply takes much time....