vm icon indicating copy to clipboard operation
vm copied to clipboard

Add FTS OCR as an option in the menu script

Open p-bo opened this issue 2 years ago • 10 comments

Steps To Reproduce

  1. Use subitem in menu.sh script for installing Full text search platform

Expected Result

Among installed search components is also https://apps.nextcloud.com/apps/files_fulltextsearch_tesseract (including deb packages with tessearct engine and recognition definitions files for all languages on underlying system), which (Nextcloud app) was updated recently (after long time).

Actual Result

OCR component for Full text search is missing (logical - it wasn't updated, thus available for long time, but situation changed recently - would it be possible to reflect that please?)

Screenshots, Videos, or Pastebins

No response

Additional Context

Many thanks for all your work on Nextcloud VM (lite, full)!

Build Version

22.2.3

Environment

By downloading the VM

Environment Details

No response

p-bo avatar Nov 28 '21 12:11 p-bo

@p-bo You are welcome to work on this if you want it implemented.

Thanks!

enoch85 avatar Jan 17 '22 12:01 enoch85

@Ark74 Something you think is a valid point?

enoch85 avatar Jan 31 '22 10:01 enoch85

AFAIK, OCR is not well supported yet, not sure if daita finished it or the current state.

Ark74 avatar Jan 31 '22 15:01 Ark74

Thanks you both for your commitment! @enoch85 - unfortunately I'm not that god in scripting to implement this reliably @Ark74 - what do you think - would it be wise to ask daita and ArtificialOwl regarding this OCR app status and future?

p-bo avatar Feb 01 '22 18:02 p-bo

@daita and @ArtificialOwl - what is status of development of this FTS OCR module please? Thanks for your eventual answer(s) :-)

p-bo avatar Feb 04 '22 13:02 p-bo

We had no bad feedback for a while. The repos migrate into nextcloud/ and we might even support it ?

You need the tesseract binary on the server to have it working.

ArtificialOwl avatar Feb 04 '22 14:02 ArtificialOwl

you might need also to edit some xml configuration of tesseract for its access rights

ArtificialOwl avatar Feb 04 '22 14:02 ArtificialOwl

@ArtificialOwl

Regarding dependencies (tesseract engine and lang data) and adjusting their configuration, it is possible task for installation/maintenance scripts here (as done for other components for Nextcloud in Nextcloud VM). If I understood correctly, developers of these scripts need to be assured, that it is worth to integrate this (will be FTS OCR component compatibility maintained for future Nextcloud versions?).

From users point of view, it would be great to have possibility to search also in texts extracted from uploaded images - as is one accustomed in case of some others cloud storage offerings. There exists also Workflow OCR add-on for it, but approach there is a bit different (thus scope of using), than FTS OCR.

So, will be FTS OCR supported and thus is meaningful to politely ask Nextcloud VM maintaners to include this into their installation automation please?

Thanks for bearing with me :-)

p-bo avatar Feb 04 '22 19:02 p-bo

@cronlabspl are we still talking about doing OCR processing of raster images / PDF files of user(s), already stored on Nextcloud, to be able to do full text search in them there (that mention of some device is confusing a bit there)?

p-bo avatar Mar 06 '22 05:03 p-bo

In NC 24 the menu option to do OCR scan in missing.

Piefje01 avatar Sep 16 '22 09:09 Piefje01