arimai comments

Repositories
Issues
Comments

Results 4 comments of


                                            arimai

cannot extract text from scanned PDF

The PDF is essentially a scanned medical document and does not contain selectable text ( i.e., when you try to select, you can only select an area of an image...

cannot extract text from scanned PDF

I am using the 'magical' OCR engine tesseract. And have also specified the ocrLanguage option in node-tika. Are you sure you are able to get any results with the pdf...

cannot extract text from scanned PDF

I checked the version in master for two scanned pdfs. One didn't give me any result and the [second](https://github.com/ICIJ/node-tika/files/342053/test2.pdf) gives the following - ``` Jun 30, 2016 9:43:15 AM org.apache.pdfbox.tools.imageio.ImageIOUtil...

cannot extract text from scanned PDF

Thanks a lot :) It works now for the pdf I attached. The first pdf though does not give any output but I figured its because of their upgrade to...