open-semantic-search icon indicating copy to clipboard operation
open-semantic-search copied to clipboard

ETL filling up OCR queue?

Open mosea3 opened this issue 4 years ago • 4 comments

In my company we just need full text search on PDFs that were already scanned and converted into Text-PDFs - so no OCR needed. And OCR was disabled in /etc/opensemantic/etl and the ETL service was restarted

Still, something is filling the OCR queue and converting PDFs into images (connected to issue #343 )

Where can I backtrace this activity?

2021-02-12 12_03_29-Suche

etl.txt

mosea3 avatar Feb 12 '21 11:02 mosea3

Is OCR yet enabled in the web admin / config ui?

This ui will write /etc/opensemanticsearch/etl-webadmin which overwrites settings in /etc/opensemanticsearch/etl

Mandalka avatar Feb 14 '21 14:02 Mandalka

I've got a similar issue. But for me I have OCR turned on. Running enrich later causes an error (seems to be deprecated) and some files/images of Websites get OCRd while others don't. Thank you for this cool project & the good work you're doing.

schneipk avatar Mar 10 '21 09:03 schneipk

Same issue here. I tried the Desktop VM as well as the latest Docker Compose file. Worst is that I don't see any errors being thrown. @Mandalka do you see the same issue with the latest build?

phretor avatar Mar 10 '21 13:03 phretor

i have the debian paket install: ii open-semantic-search 21.12.25 all Search engine and the problem that im unable to disable ocr, whatever i do there are always things added to the ocr queue.. im adding files with i.e: opensemanticsearch-index-dir /home/opensemanticetl/mnt/Projekte/archiv/aktuelle_Projekte -v

bmnnit avatar May 25 '22 10:05 bmnnit