open-semantic-search
open-semantic-search copied to clipboard
ETL filling up OCR queue?
In my company we just need full text search on PDFs that were already scanned and converted into Text-PDFs - so no OCR needed. And OCR was disabled in /etc/opensemantic/etl and the ETL service was restarted
Still, something is filling the OCR queue and converting PDFs into images (connected to issue #343 )
Where can I backtrace this activity?
Is OCR yet enabled in the web admin / config ui?
This ui will write /etc/opensemanticsearch/etl-webadmin which overwrites settings in /etc/opensemanticsearch/etl
I've got a similar issue. But for me I have OCR turned on. Running enrich later causes an error (seems to be deprecated) and some files/images of Websites get OCRd while others don't. Thank you for this cool project & the good work you're doing.
Same issue here. I tried the Desktop VM as well as the latest Docker Compose file. Worst is that I don't see any errors being thrown. @Mandalka do you see the same issue with the latest build?
i have the debian paket install: ii open-semantic-search 21.12.25 all Search engine and the problem that im unable to disable ocr, whatever i do there are always things added to the ocr queue.. im adding files with i.e: opensemanticsearch-index-dir /home/opensemanticetl/mnt/Projekte/archiv/aktuelle_Projekte -v