arimai
arimai
The PDF is essentially a scanned medical document and does not contain selectable text ( i.e., when you try to select, you can only select an area of an image...
I am using the 'magical' OCR engine tesseract. And have also specified the ocrLanguage option in node-tika. Are you sure you are able to get any results with the pdf...
I checked the version in master for two scanned pdfs. One didn't give me any result and the [second](https://github.com/ICIJ/node-tika/files/342053/test2.pdf) gives the following - ``` Jun 30, 2016 9:43:15 AM org.apache.pdfbox.tools.imageio.ImageIOUtil...
Thanks a lot :) It works now for the pdf I attached. The first pdf though does not give any output but I figured its because of their upgrade to...