fulltextsearch icon indicating copy to clipboard operation
fulltextsearch copied to clipboard

Error Ellasticsarch BadRequest400Exception

Open jonathanmmm opened this issue 4 years ago • 2 comments

Hi,

Great that such a feature is build, makes searching while using multiple machines easier. Hope somebkdy can help me with the error.

I have Nextcloud 18.0.3 All apps on newest update. Ellasticsearch is on 7.6.1 also Ingest-attachment. 8GB lf RAM (3.61 used right now) but with swap file around 32GB.

The scan works fine for a while. Scans many documents through.

But gives this error, right now it scans. I have enabled every option, like OCR or group folders or external folders (also installed tesseract).

I have Debian 9 (OMV 4) as the OS.

I get this error: Error: 117/117 │ Index: files:389122 │ Exception: Elasticsearch\Common\Exceptions\BadRequest400Exception │ Message: Error parsing document in field [content]

It is running now, but after some time it stops (like it reaches a specific amount or errrors and stops then).

I have then disabled first in the OCR the PDF setting (the one which claims heavy usage) not the other one above (not in the OCR section). Bit confusing with two PDF settings. To see if this helps. He hasn't stopped, but produces still the same error.

Also enabled the global icon and all.

I have SMB share which is very big and almost every user has access and a big group folder for almost every one. And a few big shares, like Music library.

He stops mostly on the second account (first account hast not much, but second has access to big SMB and Group share).

But it seems he is still scanning not the share folders (as he seems to scan alphabetically the folder structure).

Do I need to change some parameters in ellasticsearch?

I followed the following tutorial. But I used the newest Version (7.6.1 through ellastic.co website):

https://decatec.de/home-server/volltextsuche-in-nextcloud-mit-ocr/

I used in nextcloud settings the index name "nextcloud" but have not set this in ellasticsearch. It shows under results something like this: Result: 31852/31852 │ Index: files:391921 │ Status: ok │ Message: {"_index":"nextcloud","_type":"standard","_id":"files:391921","_ver │ sion":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_ │ seq_no":31733,"_primary_term":1}

I can send the systemctl status, if needed.

Hope somebody can help :)

jonathanmmm avatar Mar 26 '20 09:03 jonathanmmm

maybe the same error https://github.com/nextcloud/fulltextsearch/issues/580

Happyfeet01 avatar May 24 '20 06:05 Happyfeet01

If it is the same error the PR didn't fix it. Now I have Debian 10, php7.4, Nextcloud 19.0.4.

I get sometimes the following messages: Elasticsearch\Common\Exceptions\BadRequest400Exception Error parsing document in field [content]

or

Elasticsearch\Common\Exceptions\BadRequest400Exception field [content] not present as part of path [attachment.content]

right now this error accoured 465 times of 78k files (will probably count up to 1 million or so).

I have tesseract still installed but disabled it as it created 98-100% CPU loads on all 4 cpu cores for more than a week. And even after putting nextcloud in maintenance mode and rebooting the whole server it started. Now in htop I don't see anymore the multiple tesseract /tmp/somerandomstring --psm 4 or so command.

jonathanmmm avatar Nov 18 '20 17:11 jonathanmmm