fulltextsearch icon indicating copy to clipboard operation
fulltextsearch copied to clipboard

Fulltextsearch doesn't add new documents from external cifs/samba storage

Open lhurt opened this issue 7 years ago • 47 comments

Hi, may be this is just a configuration issue. I'm using nextcloud/owncloud already for a long time at least since version 6. Now I'm on Nextcloud 13.0.1 and recently (on version 12) changed from Nextant to elasticsearch.

Indexing of local files stored in Nextcloud itself works as a charme. But when it comes to indexing files on external mounted smb shares new files are not recognized as new ans thus not indexed.

Just a few remarks about my installation.

OS: Ubuntu 16.04 (i7-3773 @3.4 Ghz, 32GB RAM)
DB: Postgres 10
Webserver: Nginx

Elasticsearch 6.2.3 in Docker container with indices on SSD

Nextcloud cron selected for background job processing (not Ajax or webcron)

crontab
*/15  *  *  *  * www-data php -f /var/www/nextcloud/cron.php
*/17  *  *  *  * www-data php -f /var/www/nextcloud/occ files:scan --all --quiet
systemd daemon for live indexing
[Unit]
Description=Elasticsearch Worker for Nextcloud Fulltext Search
After=network.target

[Service]
User=www-data
Group=www-data
WorkingDirectory=/var/www/nextcloud
ExecStart=/usr/bin/php /var/www/nextcloud/occ fulltextsearch:live
Nice=19
Restart=always

[Install]
WantedBy=multi-user.target

All fulltext apps on most recent version 0.61 except Full text search - Files which is on 0.60 due to the lack of a newer version

Any help would be greatly appreciated.

Many thanks in advance.

lhurt avatar Apr 19 '18 14:04 lhurt

Hey @lhurt I look like a known problem https://github.com/nextcloud/fulltextsearch/issues/250

zafai avatar Apr 20 '18 07:04 zafai

Thanks for the information. We'll see after the update.

lhurt avatar Apr 25 '18 04:04 lhurt

Hi I think there is a new unexplained term "workingdirectory" I assume that this is where the nextcloud code resides, not the data is "datadirectory"

ferdiga avatar Apr 30 '18 07:04 ferdiga

Please keep me update after the release of fulltextsearch 0.7 (within the next few days)

ArtificialOwl avatar May 11 '18 11:05 ArtificialOwl

Works for me now with 0.72.

lhurt avatar May 21 '18 15:05 lhurt

Sorry. Tested only by adding files to the external storage in Nexcloud via Web Interface. This works. If I add files to a samba share on a windows client it's not indexed.

lhurt avatar May 21 '18 16:05 lhurt

Might be an issue with the sync and/or the event of a new file is not dispatched, therefor, fulltextsearch is not aware that a new file have been uploaded

ArtificialOwl avatar May 23 '18 11:05 ArtificialOwl

How can this be solved as I'd suppose this to be a very common use case?

lhurt avatar May 24 '18 10:05 lhurt

If I add files to a samba share on a windows client it's not indexed.

@lhurt did you tell your cifs client that it should use version 2?

Sanookmakmak avatar May 25 '18 11:05 Sanookmakmak

@Sanookmakmak I suppose you mean the smb protocol version? I set it to smb2 but the issue remains.

lhurt avatar May 27 '18 05:05 lhurt

@Sanookmakmak Now I even went to smb3 and still the issue remains.

lhurt avatar May 29 '18 05:05 lhurt

Upgraded to 13.0.4 and unfortunately still no improvement

lhurt avatar Jun 15 '18 11:06 lhurt

Upgraded to 14.0.1 and fulltextsearch 0.99.2/3/4 and still no solution

Current steps to reproduce

  1. Store any text file on directory located on samba share
  2. Delete index
  3. Rebuild index
  4. Start live indexing
  5. Copy file from step 1 in same directory with different name
  6. Wait 1 day
  7. Search for text contained in text file from step 1
  8. Result only one hit with filename from file in step 1

Expected:

2 hits. Files from step 1 and step 5

Very disappointing.

lhurt avatar Sep 30 '18 10:09 lhurt

Unintentionally closed. The issue still persists even with version 1.0

lhurt avatar Oct 03 '18 13:10 lhurt

With version 1.2.3 the issue seems to be resolved and everything is working as expected so far.

lhurt avatar Dec 20 '18 09:12 lhurt

Have to reopen it.

With Nextcloud 15.0.6 and full text search 1.2.5, full text search - files 1.2.6 files are not added to the index when they are created on the file system, e.g. when the folder is mounted as a Windows drive.

Did a complete reinstall/reindex to verify that it's not corrupted leftover data. and yes, the entry d /var/run/samba 2775 root www-data - - is also present in /usr/lib/tmpfiles.d/samba.conf of the host system

I don't understand why it doesn't work.

lhurt avatar Apr 07 '19 05:04 lhurt

This should be fixed on NC16, please keep me updated if you have a chance to test it !

ArtificialOwl avatar Apr 07 '19 09:04 ArtificialOwl

This should be fixed on NC16, please keep me updated if you have a chance to test it !

Thanks very much for the fast reply Since NC16 is scheduled for release on April 25th, I'll wait for it and reply then.

lhurt avatar Apr 07 '19 10:04 lhurt

After testing with NC16 I have to say that the issue wasn't fixed, unfortunately,

To reproduce I use the steps described above using windows explorer as samba client.

The interactive live index screen didn't show any activity until I copied the file in nextcloud's browser interface. Then the display was as expected.

Memory: 15 MB
┌─ Indexing  ────
│ Action: waiting
│ Provider: Files                Account: Ludwig
│ Document: 26109787
│ Info: text/plain
│ Title: _Dokumente/_ignoriere_mich/TopBankingError (kopieren).txt
│ Content size: 3988
└──
┌─ Results ────
│ Result:      1/1
│ Index: files:26109787
│ Status: ok
│ Message: {"_index":"my_nextcloud","_type":"standard","_id":"files:26109787",
│ "_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":
│ 0},"_seq_no":97697,"_primary_term":1}
└──
┌─ Errors ────
│ Error:      1/1
│ Index: files:16598
│ Exception: Elasticsearch\Common\Exceptions\ServerErrorResponseException
│ Message: java.lang.IllegalArgumentException: java.lang.IllegalArgumentExcept
│ ion: field [content] not present as part of path [attachment.content]
│ 
└──
## x:first result ## c/v:prec/next result ## b:last result
## f:first error ## h/j:prec/next error ## d:delete error ## l:last error
## q:quit ## p:pause

Looks like the live indexer doesn't watch inotify. Inotify is running and accessible within the docker container. I tested this with another container.

In addition to that I ran the external file notify command before copying a file in Windows explorer.

docker exec --user www-data nextcloud_php_fpm php occ files_external:notify -vvu user-p password 17
Self-test successful
added /_ignoriere_mich/TopBankingError - Kopie.txt
modified /_ignoriere_mich/TopBankingError - Kopie.txt

So nextcloud is notified about what's going on. The live index monitor stays silent

Memory: 12 MB
┌─ Indexing  ────
│ Action: waiting
│ Provider:                      Account: 
│ Document: 
│ Info: 
│ Title: 
│ Content size: 
└──
┌─ Results ────
│ Result:      0/0
│ Index: 
│ Status: 
│ Message: 
│ 
│ 
└──
┌─ Errors ────
│ Error:      1/1
│ Index: files:16598
│ Exception: Elasticsearch\Common\Exceptions\ServerErrorResponseException
│ Message: java.lang.IllegalArgumentException: java.lang.IllegalArgumentExcept
│ ion: field [content] not present as part of path [attachment.content]
│ 
└──
## x:first result ## c/v:prec/next result ## b:last result
## f:first error ## h/j:prec/next error ## d:delete error ## l:last error
## q:quit ## p:pause 

So the notification seems to be there it just doesn't trigger the indexer.

Any idea what's going wrong?

lhurt avatar May 08 '19 18:05 lhurt

New update 16.0.1 didn't improve the situation. Same behavior.

lhurt avatar May 17 '19 04:05 lhurt

@icewind1991 would you care having a look ?

ArtificialOwl avatar May 21 '19 14:05 ArtificialOwl

You are using Ubuntu 16.04 with smb protocol v2 or v3? Maybe this is related icewind1991/SMB/issues/56

theroch avatar May 22 '19 07:05 theroch

Of course I use SMB > 1 as it is deprecated and Windows will only connect with workarounds that I don't want to apply.

Here's my relevant part of smb.conf

---- snip ----
client min protocol = SMB2
client max protocol = SMB3
---- snip ----

lhurt avatar May 22 '19 07:05 lhurt

Nevertheles I just noticed that there may be an issue due to the fact, that i'm using docker. My installation is based on the fpm image which is itself based on debian stretch. And here the smbclient version is 4.5.16! which is very far behind as 4.10 is current.

I'll try changing the base image to fpm-alpine that should have a 4.10 smbclient and may be this solves it. As soon as I have results I'll post it.

lhurt avatar May 22 '19 09:05 lhurt

But the notify problem is only related to you if you use occ files_external:notify. But if I see, you are using the cron with files:scan and you doesn't use the notify as in External Storage SMB/CIFS described.

theroch avatar May 22 '19 11:05 theroch

This issue exists now for over 1 year and it seems like I'm the only one having this problem. IS this really the case? Is it such an extraordinary use case?

lhurt avatar Jun 13 '19 04:06 lhurt

Still doesn't work with fulltext 1.3.6 and fulltext_files 1.3.5

Does anyone else have this working?

lhurt avatar Jul 24 '19 05:07 lhurt

I can confirm this problem with nextcloud 16.0.3 and fulltext 1.3.6 and fulltext_files 1.3.5. If I create a new file directly on the share this file is not indexed. occ files_external:notify is running and live index is running too.

theroch avatar Jul 24 '19 11:07 theroch

I tested something one time and created a new file. The file doesn't seem to be updated via notify, but after about 24h the file was included in the fulltextsearch.

theroch avatar Jul 26 '19 07:07 theroch

Same problem with N16 Going to test the N17 will post the results here

Sx3 avatar Oct 03 '19 10:10 Sx3