open-semantic-search icon indicating copy to clipboard operation
open-semantic-search copied to clipboard

'enhance_extract_text_tika_server' error message

Open RabbitJackTrade opened this issue 3 years ago • 2 comments

Newbie here, so please pardon if I'm missing something:

I'm running the VM in Oracle Virtual Box under Windows 10 (all current versions).

I tried indexing a file (always a Microsoft Word docuemnt) using the browser (search-apps/files/create) - the response I get is

File or directory added to queue.

The file name shows up in the Newest documents tab, but the content is never indexed.

Trying the same thing using CLI

opensemanticsearch-index-dir /path/to/filename

gets this response

Indexing new file: /path/to/filename

but the indexing never takes place. When I run this again, the response this time is

Repeating indexing of unchanged file because critical plugin(s) ['enhance_extract_text_tika_server'] failed in former run: /path/to/filename

or, on occasion

Repeating indexing of unchanged file because (additional configured) plugin(s) or options ['enhance_extract_text_tika_server_ocr_enabled'] not runned yet: /path/to/filename

As I mentioned - all documents are in Microsoft Word format, so I'm not sure what ocr has to do with it. I've seen references to the first error message but couldn't find a solution.

Thanks.

RabbitJackTrade avatar Jul 19 '21 15:07 RabbitJackTrade

I confirm that this happens as well with .pdf and other Office formats (.xls, .xlsx), using the latest from master.

denispol avatar Feb 27 '23 16:02 denispol

Same problem here. Honestly, Open Semantic Search seems a wonderful tool, but it's a quite frustrating experience. I spent one week trying to install OSS on Ubuntu LTS, and the only solution was to use Debian instead inspite of what was claimed in the docs. Now, on Debian the tools is installed but it doesn't index the files content, and what I get here is that the problem is known from 2021 and there's no proposed solution.

AndreaPux avatar Mar 13 '23 08:03 AndreaPux