open-semantic-search
open-semantic-search copied to clipboard
'enhance_extract_text_tika_server' error message
Newbie here, so please pardon if I'm missing something:
I'm running the VM in Oracle Virtual Box under Windows 10 (all current versions).
I tried indexing a file (always a Microsoft Word docuemnt) using the browser (search-apps/files/create) - the response I get is
File or directory added to queue.
The file name shows up in the Newest documents
tab, but the content is never indexed.
Trying the same thing using CLI
opensemanticsearch-index-dir /path/to/filename
gets this response
Indexing new file: /path/to/filename
but the indexing never takes place. When I run this again, the response this time is
Repeating indexing of unchanged file because critical plugin(s) ['enhance_extract_text_tika_server'] failed in former run: /path/to/filename
or, on occasion
Repeating indexing of unchanged file because (additional configured) plugin(s) or options ['enhance_extract_text_tika_server_ocr_enabled'] not runned yet: /path/to/filename
As I mentioned - all documents are in Microsoft Word format, so I'm not sure what ocr has to do with it. I've seen references to the first error message but couldn't find a solution.
Thanks.
I confirm that this happens as well with .pdf and other Office formats (.xls, .xlsx), using the latest from master.
Same problem here. Honestly, Open Semantic Search seems a wonderful tool, but it's a quite frustrating experience. I spent one week trying to install OSS on Ubuntu LTS, and the only solution was to use Debian instead inspite of what was claimed in the docs. Now, on Debian the tools is installed but it doesn't index the files content, and what I get here is that the problem is known from 2021 and there's no proposed solution.