webwhiz
webwhiz copied to clipboard
Text inside PDF fails to be crawled
Dear WebWhiz Team,
When trying to create a chatbot and upload the following PDF either I get a 500 error code
If I add other data files, the chatbot gets created but it fails to answer my questions regarding the earlier mentioned PDF, saying:
I don't know the answer to that
One of the issues might be that the Data Crawler does not support OCR, and only retrieves text from PDF files that already contain embedded Texts within them. However, for PDF files that look like they contain Textual Data from a first glance, however they do not contain any embedded Text, the Crawler fails to get the data resulting in such issues.
I hope this can be helpful for debugging this issue.
I also saw that a similar issue has been reported here: #107