tesseract icon indicating copy to clipboard operation
tesseract copied to clipboard

Tesseract Open Source OCR Engine (main repository)

Results 218 tesseract issues
Sort by recently updated
recently updated
newest added

root@ubuntu:/home/administrator/tesseract-master# /usr/local/bin/text2image --text=/home/administrator/langdata/chi_sim/chi_sim.training_text --fontconfig_tmpdir=/tmp/font_tmp.HKWX4LOUh0 --fonts_dir=/usr/share/fonts --strip_unrenderable_words --leading=52 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0 --max_pages=0 --font=FreeSerif Bold Stripped 211 unrenderable words Rendered page 0 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 222 unrenderable words Rendered page 1...

bug
training
text2image

Environment: * **Tesseract Version**: Tesseract 5.0 * **Platform**:Windows10 the shell: E:\Tess-OCR\N0>tesseract.exe num.font.exp0.tif num.font.exp0 batch.nochop makebox Page 1 Page 2 Page 3 E:\Tess-OCR\N0>tesseract.exe num.font.exp0.tif num.font.exp0 -l eng --psm 7 nobatch box.train...

legacy

### Environment * **Tesseract Version**: 4.1.1 * **Platform**: Linux ### Current Behavior: I can't fine tune Persian Language `failed to load script unicharset from:../langdata_lstm/Inherited.unicharset ` I couldn't find this file...

training

Although it's a free service, we might still care about the waste of power consumption. https://github.community/t/trigger-action-on-schedule-only-if-there-are-changes-to-the-branch/17887/4

CI

For Tesseract 5 what are the best practices regarding OPENMP. Is it still true: 1. OPENMP is **needed** for training so build tesseract and training tools with `--enable-openmp`. 2. For...

question
OpenMP

AFAICS, `lstmtraining` produces two types of figures for measuring the error: 1. **bag-of-character training error** (on `list.train`): this is shown as - `char train=%.3f%%` every 100 iterations - `Finished! Error...

feature request

### Environment * **Tesseract Version**: Tesseract Open Source OCR Engine v5.0.0.20190623 with Leptonica (downloaded from https://digi.bib.uni-mannheim.de/tesseract/ ) * **Platform**: W7 64 bit 6.1.7601 ### Call: "C:\Program Files\Tesseract-OCR\tesseract" scan.tif scan-ocr pdf...

PDF

### Environment * **Tesseract Version**: 5.0.1 * **Platform**: Windows 10 (64-bit) ### Current Behavior: The correct output is returned *without* whitelisting enabled, but an empty string is returned *with* whitelisting...

allowlist / denylist

I have tested latest release 3.05 on windows platform to OCR Arabic document to PDF (searchable) and when choose text from output PDF file it seems stored in opposite (left...

bug
PDF
RTL

As for ["Best results may be obtained by having a single pattern in the file."](https://tesseract-ocr.github.io/tessdoc/APIExample-user_patterns) it might be a good idea to have appropriate variable. Something like `user_pattern` or `tessedit_pattern`...

feature request
API