Add FTS OCR as an option in the menu script
Steps To Reproduce
- Use subitem in menu.sh script for installing Full text search platform
Expected Result
Among installed search components is also https://apps.nextcloud.com/apps/files_fulltextsearch_tesseract (including deb packages with tessearct engine and recognition definitions files for all languages on underlying system), which (Nextcloud app) was updated recently (after long time).
Actual Result
OCR component for Full text search is missing (logical - it wasn't updated, thus available for long time, but situation changed recently - would it be possible to reflect that please?)
Screenshots, Videos, or Pastebins
No response
Additional Context
Many thanks for all your work on Nextcloud VM (lite, full)!
Build Version
22.2.3
Environment
By downloading the VM
Environment Details
No response
@p-bo You are welcome to work on this if you want it implemented.
Thanks!
@Ark74 Something you think is a valid point?
AFAIK, OCR is not well supported yet, not sure if daita finished it or the current state.
Thanks you both for your commitment! @enoch85 - unfortunately I'm not that god in scripting to implement this reliably @Ark74 - what do you think - would it be wise to ask daita and ArtificialOwl regarding this OCR app status and future?
@daita and @ArtificialOwl - what is status of development of this FTS OCR module please? Thanks for your eventual answer(s) :-)
We had no bad feedback for a while. The repos migrate into nextcloud/ and we might even support it ?
You need the tesseract binary on the server to have it working.
you might need also to edit some xml configuration of tesseract for its access rights
@ArtificialOwl
Regarding dependencies (tesseract engine and lang data) and adjusting their configuration, it is possible task for installation/maintenance scripts here (as done for other components for Nextcloud in Nextcloud VM). If I understood correctly, developers of these scripts need to be assured, that it is worth to integrate this (will be FTS OCR component compatibility maintained for future Nextcloud versions?).
From users point of view, it would be great to have possibility to search also in texts extracted from uploaded images - as is one accustomed in case of some others cloud storage offerings. There exists also Workflow OCR add-on for it, but approach there is a bit different (thus scope of using), than FTS OCR.
So, will be FTS OCR supported and thus is meaningful to politely ask Nextcloud VM maintaners to include this into their installation automation please?
Thanks for bearing with me :-)
@cronlabspl are we still talking about doing OCR processing of raster images / PDF files of user(s), already stored on Nextcloud, to be able to do full text search in them there (that mention of some device is confusing a bit there)?
In NC 24 the menu option to do OCR scan in missing.