tesseract
tesseract copied to clipboard
Missing text from the image with high quality of 300 dpi
I have an image which is of 300 dpi which is converted into grayscale. When I try to print the pytesseract.image_to_string with the configuration of config="--psm 6" it produces the below output
&- EUROCORPORATION PRODUTTORE D&D Costruzioni SRL CLIENTE D&D Costruzioni SRL
PORTA ROMANA - VIA METASTASIO 48 50124 (FI) Mario Nobile - Mob. 3381558401
FABRIZIO PIPOLO CF Produttore 06725110487
3357566640 Email Cliente [email protected]
23/02/2022 08:00 Email Produttore | [email protected]
OE
| Mattina | Dalle 08:00 Alle 12:00 PER LA LOGISTICA: CHIAMARE PER COMUNICARE ORARIO Florec 3200436592 https://www.google.com/maps/@43.7602578,11.2378078,3a, 75y ,244.78h,76.14t/data=!3m7!1e1!3m5! 1sd6ERXwcpkiHcDdpt_LkqMA!2e0!6shttps:%2F% 2Fstreetviewpixels-pa.googleapis.com%2Fv1%2Fthumbnail% 3Fpanoid%3Dd6ERXwcpkiHcDdpt_LkqMA%26cb_client% 3Dmaps-_sv.tactile.gps%26w%3D203%26h%3D100%26yaw% 3D322.34424%26pitch%3D0%26thumbfov%3D100!7i116384! 818192
170904.1 Calcinacci puliti Solido non pulverulento El EUROCORPORATION EUROCORPORATION S.r.I. C.F. - P.iva 05235640488 R.E.A. 531452 Capitale Sociale € 100.000,000 Via de’ Cattani 178 - 50145 Firenze | Tel. 055 7222419 Fax 055 7227520 [email protected] | [email protected] | www.eurocorporation.it Pag. 1 di 2 23/02/2022
Environment
- Tesseract Version: 4
- Commit Number: https://github.com/jyotiyadav94/ImageProcessing/blob/main/pytesseract_layoutlmv3.ipynb
- Platform: windows
Current Behavior:
It is missing some strings like Unit. Loc, Operatore, Data Richiesta etc from the image
Expected Behavior:
It should detect all the strings from the image
Image - https://github.com/jyotiyadav94/ImageProcessing/blob/main/dataset/11page-0.PNG
We don't support 3rd party tools like pytesseract. We also don't support Tesseract 4.x.
Try again with Tesseract 5.2.0.
@amitdo How to install tesseract 5.2.0 using Dockerfile? Locally I did brew install tesseract
which worked.