Tika server 2.9.1 Pdf tesseract Ocr

Open Tarik37 opened this issue 2 years ago • 0 comments

Hello, The beginner that i am need your help, i use tika server to extract meta and text with ocr strategy auto on native pdf documents no problem as thé process Time is low but on scanned pdf files (hundreds pages) i hit the timeout of thé request throught python or curl. Is their a way to config tika-config.yml file to make the thé ocr process all the pages with strategy auto. Thks in advance.

Mar 30 '24 04:03 Tarik37