tesseract icon indicating copy to clipboard operation
tesseract copied to clipboard

Tesseract Open Source OCR Engine (main repository)

Results 218 tesseract issues
Sort by recently updated
recently updated
newest added

``` tesseract -v tesseract 5.0.0-18-g771c1 leptonica-1.78.0 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0 Found NEON...

bug

Non-linear grayscale normalization ------------------------------------------ ## Draft PR only I would first like to use the draft PR option to get some feedback that the implementation of grayscale normalization works smoothly...

### Environment * **Tesseract Version**: 4.0.0-2 * **Platform**: Linux mx 4.19.0-20-amd64 SMP Debian 4.19.235-1 (2022-03-17) x86_64 GNU/Linux My observation is about the following image: ![1_0](https://user-images.githubusercontent.com/13977359/177207586-20a5f497-8ece-486b-ae0d-2d515fe7a226.jpeg) ### Current Behavior: In ![k20220704-211236](https://user-images.githubusercontent.com/13977359/177207637-bf221851-b7d0-4d4f-8283-b58a9f7a5579.jpg)...

layout analysis

### Environment tesseract v5.0.0-alpha.20191030 Windows 10 64bit ### Current Behavior: Document (book, 900 dpi, good quality, no noise) with ~10% of words italicized. No italics found in hocr output.. Or,...

question
legacy

https://github.com/tesseract-ocr/tesseract/blob/01535706ecd11a495f9f6e65ee0118b3537a115a/src/textord/strokewidth.cpp#L121-L128 https://github.com/tesseract-ocr/tesseract/blob/2a37f5dd6226156bf32e60d1ae0209a383af4843/src/classify/intproto.cpp#L231-L238 https://github.com/tesseract-ocr/tesseract/blob/912c9978dacf9f7a7cc97ba9a906b40e3df8fc09/src/dict/permdawg.cpp#L111-L121

RFC

### Environment * **Tesseract Version**: v5.0.0 * **Platform**: windows, 64bit ### Current Behavior: I am using pytesseract to recognize text as follow ```python td = pytesseract.image_to_data(img, output_type=Output.DICT) tn_boxes = len(td['level'])...

wontfix
layout analysis

### Environment * **Tesseract Version**: tesseract v5.0.0-alpha.20200328 * **Platform**: Windows, 64-bit ### Current Behavior: ![1](https://user-images.githubusercontent.com/49054714/88778236-d7a10e00-d1ba-11ea-84dd-3ecdedb1a3ea.JPG) While using chi_tra to work on this image, the result was "載", which was great....

allowlist / denylist

Hello, i'm try to identify some number in a page (i need numbers and positions), i try to set up different set up but cannot achieve better result. My best...

layout analysis

### Environment * **Tesseract Version**: 5.1.0 * **Platform**: Windows 32-bit, compiled under MSVC 2017 ### Current Behavior: I have the following problem.. 1. I prepared a custom build for Tesseract...

traineddata

I've tried .traineddata from here https://github.com/tesseract-ocr/tessdata and from here https://github.com/tesseract-ocr/tessdata_fast for some languages like Serbian Latin, Croatian and Turkish and they seemed to have major issues recognizing the "@" character...

traineddata