fulltextsearch
fulltextsearch copied to clipboard
image files and external files are indexed although both are set to not get indexed.
especially indexing of images is extremely time consuming and must be excluded.
- up to ONE minute per image and 100% CPU - hence basically unusable for databases with a lot of images. eventually only the exif information should be extracted and indexed.
BTW there is currently no option to configure and others
"files_image": "0",
"files_audio": "0",
- Content Providers:
Deck 1.13.1
[]
Files 29.0.1
{
"files_local": "1",
"files_external": "2",
"files_group_folders": "1",
"files_encrypted": "0",
"files_federated": "0",
"files_size": "1",
"files_pdf": "1",
"files_office": "1",
"files_image": "0",
"files_audio": "0",
"files_chunk_size": "2",
"files_fulltextsearch_tesseract": {
"version": "27.0.0",
"enabled": "1",
"psm": "4",
"lang": "eng,deu,fra",
"pdf": "1",
"pdf_limit": "0"
}
}
just to illustrate - the indexing finished after 2 1/2 month with an error.
- not very helpful error message - have to contact Hetzner support
- during this time no other occ command is allowed
Do you scan images?
apparently - yes - although the settings should exclude imaged - as shown above "files_image": "0", "files_audio": "0",
IMO EXIF Data could be scanned at low cost
Some how it did not work: Ocr within a jpg for example . I have a lot of photo's with sample numbers but it will no be scanned.