tesseract
tesseract copied to clipboard
Tesseract Open Source OCR Engine (main repository)
As the development of OCR is moving towards LSTM, new difficulties arise, such as having created lstm/traineddata for different fonts. **Example situations:** - I have created multiple lstm/ traineddata models,...
Font detection works fine in PSM_SINGLE_WORD mode. In PSM_SINGLE_LINE mode it is not working well. For recognizing more text (column, full page) font detection does not work at all. For...
Tesseract Version: 5 OS: LInux ubuntu 20 command: tesseract imagename --dpi 300 -l spa --psm 6 --oem 1 tried with different psm values, value 6 offers the best results **original...
### Environment tesseract v5.0.0-alpha.20191030 Windows 64bit ### Current Behavior: When in text mode, each line in a paragraph is separated by line-break: ``` Line 1 of foo or bar, another...
### Environment * **Tesseract Version**: 4.0.0 ~~4.0.0-beta.1 from https://packages.debian.org/stretch-backports/tesseract-ocr~~ * **Commit Number**: 51316994ccae0b48692d547030f26c0969308214 ~~c3ed6f036064e54e34f75275f66c70dd924527bf~~ * **Platform**: `Linux my-machine 4.9.0-8-amd64 #1 SMP Debian 4.9.130-2 (2018-10-27) x86_64 GNU/Linux` ### Current Behavior: Tesseract...
----------------------- ### Environment v5.0.0 alpha Windows 10-64 bit ### Current Behavior: I have two very similar documents seen here and here: In the former, Tesseract with psm=1 parses out two...
According to Tesseract 4.0.0 [Release Notes](https://github.com/tesseract-ocr/tesseract/wiki/ReleaseNotes#tesseract-release-notes-oct-29-2018---v400) : > Added a new OCR engine that uses neural network system based on LSTMs, with major accuracy gains. My testing with this new...
I tried to test with 4.0 with Telugu language and observed many words are dropping in between I given a 300 DPI PNG file.is this a known issue ? If...
Before you submit an issue, please review [the guidelines for this repository](https://github.com/tesseract-ocr/tesseract/blob/master/CONTRIBUTING.md). Please report an issue only for a BUG, not for asking questions. Note that it will be much...
### Environment * **Tesseract Version**: Tesseract v5.0.0-alpha-20210401 * **Commit Number**: 38f0fdc * **Platform**: Linux/Debian ### Current Behavior: Repeats parts of preceding or following line. Looks like some memory constructs are...