fscrawler icon indicating copy to clipboard operation
fscrawler copied to clipboard

Interface ABBYY FineReader OCR with fscrawler

Open manoj4321 opened this issue 5 years ago • 4 comments

Although, tesseract is integrated with fscrawler for OCR. But, Tesseract fails when data is in tabular form. I found that ABBYY FineReader OCR does that efficiently. Is there any provision of adding ABBYY into fscrcawler.

manoj4321 avatar Mar 29 '19 16:03 manoj4321

That's a good idea. It would require to implement a new Tika Parser similar to what Tesseract Parser does. I think that's something that should be done in Tika though to offer this feature more globally than in FSCrawler only. cc @tballison. That being said, it can be added here.

Is this something you'd like to contribute @manoj4321?

dadoonet avatar Mar 29 '19 17:03 dadoonet

No news on this one. So let's close it. Feel free to reopen if you have any idea on how to implement that.

dadoonet avatar Jul 29 '21 12:07 dadoonet

We made integration with other ocr engines much easier in 2.x. The new feature is entirely undocumented. Ping me if you want help with this.

tballison avatar Jul 29 '21 12:07 tballison

Amazing @tballison. I'm reopening this one then.

dadoonet avatar Jul 29 '21 12:07 dadoonet