tesseract issues

Results 218 tesseract issues

Sort by recently updated

text2image Null box at index 0

root@ubuntu:/home/administrator/tesseract-master# /usr/local/bin/text2image --text=/home/administrator/langdata/chi_sim/chi_sim.training_text --fontconfig_tmpdir=/tmp/font_tmp.HKWX4LOUh0 --fonts_dir=/usr/share/fonts --strip_unrenderable_words --leading=52 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0 --max_pages=0 --font=FreeSerif Bold Stripped 211 unrenderable words Rendered page 0 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 222 unrenderable words Rendered page 1...

YiWenFY

bug

training

text2image

Warning: No shape table file present: shapetable

Environment: * **Tesseract Version**: Tesseract 5.0 * **Platform**:Windows10 the shell: E:\Tess-OCR\N0>tesseract.exe num.font.exp0.tif num.font.exp0 batch.nochop makebox Page 1 Page 2 Page 3 E:\Tess-OCR\N0>tesseract.exe num.font.exp0.tif num.font.exp0 -l eng --psm 7 nobatch box.train...

skyfire88

legacy

Inherited.unicharset

### Environment * **Tesseract Version**: 4.1.1 * **Platform**: Linux ### Current Behavior: I can't fine tune Persian Language `failed to load script unicharset from:../langdata_lstm/Inherited.unicharset ` I couldn't find this file...

typeoo

training

CI: Don't run a daily job when no commit was pushed in the last 24 hours

Although it's a free service, we might still care about the waste of power consumption. https://github.community/t/trigger-action-on-schedule-only-if-there-are-changes-to-the-branch/17887/4

amitdo

RFC: Best Practices re OPENMP - for training, evaluation and recognition

For Tesseract 5 what are the best practices regarding OPENMP. Is it still true: 1. OPENMP is **needed** for training so build tesseract and training tools with `--enable-openmp`. 2. For...

Shreeshrii

question

OpenMP

output true CER for checkpoints (at least the final one)

AFAICS, `lstmtraining` produces two types of figures for measuring the error: 1. **bag-of-character training error** (on `list.train`): this is shown as - `char train=%.3f%%` every 100 iterations - `Finished! Error...

bertsky

feature request

Invisible glyph bounds at wrong positions in PDF

### Environment * **Tesseract Version**: Tesseract Open Source OCR Engine v5.0.0.20190623 with Leptonica (downloaded from https://digi.bib.uni-mannheim.de/tesseract/ ) * **Platform**: W7 64 bit 6.1.7601 ### Call: "C:\Program Files\Tesseract-OCR\tesseract" scan.tif scan-ocr pdf...

THausherr

PDF

Tesseract misses whitelisted characters

### Environment * **Tesseract Version**: 5.0.1 * **Platform**: Windows 10 (64-bit) ### Current Behavior: The correct output is returned *without* whitelisting enabled, but an empty string is returned *with* whitelisting...

eicksl

allowlist / denylist

Arabic language (right to left in writing) stored (left to right) after create PDF Searchable

102

I have tested latest release 3.05 on windows platform to OCR Arabic document to PDF (searchable) and when choose text from output PDF file it seems stored in opposite (left...

tbadran

bug

PDF

RTL

Add a variable to set single pattern without config file

As for ["Best results may be obtained by having a single pattern in the file."](https://tesseract-ocr.github.io/tessdoc/APIExample-user_patterns) it might be a good idea to have appropriate variable. Something like `user_pattern` or `tessedit_pattern`...

bo-bac

feature request

API

tesseract
tesseract copied to clipboard

Metadata

text2image Null box at index 0

Warning: No shape table file present: shapetable

Inherited.unicharset

CI: Don't run a daily job when no commit was pushed in the last 24 hours

RFC: Best Practices re OPENMP - for training, evaluation and recognition

output true CER for checkpoints (at least the final one)

Invisible glyph bounds at wrong positions in PDF

Tesseract misses whitelisted characters

Arabic language (right to left in writing) stored (left to right) after create PDF Searchable

Add a variable to set single pattern without config file

← Metadata

Owner

Metadata

tesseract tesseract copied to clipboard

Metadata

← Metadata

Owner

Metadata

tesseract
tesseract copied to clipboard