image-table-ocr icon indicating copy to clipboard operation
image-table-ocr copied to clipboard

Traineddata path issue on Windows 10.

Open gety9 opened this issue 2 years ago • 8 comments

When i run

python -m table_ocr.demo https://raw.githubusercontent.com/eihli/image-table-ocr/master/resources/test_data/simple.png

i get

pytesseract.pytesseract.TesseractError: (1, 'Error opening data file C:UsersGetyAppDataLocalProgramsPythonPython38libsite-packagestable_ocrtessdata/table-ocr.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language \'table-ocr\' Tesseract couldn\'t load any languages! Could not initialize tesseract.')

(note file path does not have '/')

File does exist

I tried setting env variable TESSDATA_PREFIX - same error.

as well as specifying path in cli python -m table_ocr.demo https://raw.githubusercontent.com/eihli/image-table-ocr/master/resources/test_data/simple.png --tessdata-dir C:\Users\Btycoon\AppData\Local\Programs\Python\Python38\Lib\site-packages\table_ocr\tessdata

I am on Windows 10.

gety9 avatar Mar 15 '22 22:03 gety9

Sorry to say that I have very little knowledge of Windows 10. I'll leave this comment open for a while in case anyone else has a suggestion.

eihli avatar Mar 20 '22 19:03 eihli

I'd write my workaround.

https://github.com/eihli/image-table-ocr/blob/49205462a3fb68240fd6a3d441ae7cf979b43daa/table_ocr/ocr_image/init.py#L30

Replace all backslash to forward slashes. tessdata_dir.replace("\\", "/").

After making the necessary changes, the program works. Windows 10, Python 3.10.4.

MikuAuahDark avatar Apr 12 '22 13:04 MikuAuahDark

Also I found out that the built-in Tesseract data that my installation has is more superior compared to the shipped one, so I removed the related line entirely.

MikuAuahDark avatar Apr 12 '22 14:04 MikuAuahDark

Replace all backslash to forward slashes. tessdata_dir.replace("\", "/").

Thank you, I was trying to find a problem all day

GeniusBroccoli avatar May 08 '22 17:05 GeniusBroccoli

If anyone wants to submit a patch to make this more portable across Linux/Windows, please do!

eihli avatar Dec 15 '22 06:12 eihli

When i run

python -m table_ocr.demo https://raw.githubusercontent.com/eihli/image-table-ocr/master/resources/test_data/simple.png

i get

pytesseract.pytesseract.TesseractError: (1, 'Error opening data file C:UsersGetyAppDataLocalProgramsPythonPython38libsite-packagestable_ocrtessdata/table-ocr.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language \'table-ocr\' Tesseract couldn\'t load any languages! Could not initialize tesseract.')

(note file path does not have '/')

File does exist

I tried setting env variable TESSDATA_PREFIX - same error.

as well as specifying path in cli python -m table_ocr.demo https://raw.githubusercontent.com/eihli/image-table-ocr/master/resources/test_data/simple.png --tessdata-dir C:\Users\Btycoon\AppData\Local\Programs\Python\Python38\Lib\site-packages\table_ocr\tessdata

I am on Windows 10.

I am also getting same error. Did you solve this problem?

ajay27bhat avatar May 23 '23 12:05 ajay27bhat

你好!邮件已收到,谢谢!

rucxiaowen avatar May 23 '23 12:05 rucxiaowen

I'd write my workaround.

https://github.com/eihli/image-table-ocr/blob/49205462a3fb68240fd6a3d441ae7cf979b43daa/table_ocr/ocr_image/init.py#L30

Replace all backslash to forward slashes. tessdata_dir.replace("\\", "/").

After making the necessary changes, the program works. Windows 10, Python 3.10.4.

How do I run this project after I make changes? I am new to this. So can you please guide on how to run this project? Thanks

ajay27bhat avatar May 23 '23 12:05 ajay27bhat