fulltextsearch_elasticsearch icon indicating copy to clipboard operation
fulltextsearch_elasticsearch copied to clipboard

Again about reindexing

Open oleua opened this issue 2 years ago • 7 comments

Hi! I am asking for help. Almost a week after moving from server installed elasticsearch to the dockerised I am struggling to make the full text search working on my instance:

Reindexing done with the following commands:

$ sudo -u www-data php /var/www/nextcloud/occ fulltextsearch:stop
$ curl -X DELETE localhost:9200/my-index
$ sudo -u www-data php /var/www/nextcloud/occ fulltextsearch:reset
$ sudo -u www-data php /var/www/nextcloud/occ fulltextsearch:index

Searching with curl in index "my-index" for the word "Ivanova" (our member) gives (user data are stripped and scratched):

$ curl -X GET "localhost:9200/my-index/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match": {
      "title": "Ivanova"
    }
  }
}
'
{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 6,
      "relation" : "eq"
    },
    "max_score" : 9.4341955,
    "hits" : [
      {
        "_index" : "my-index",
        "_id" : "files:6811",
        "_score" : 9.4341955,
        "_ignored" : [
          "content.keyword"
        ],
        "_source" : {
          "owner" : "user1",
          "groups" : [
            "GroupA",
            "GroupB",
            "GroupC",
            "GroupD"
          ],
          "circles" : [ ],
          "metatags" : [
            "files_group_folders"
          ],
          "source" : "files_group_folders",
          "title" : "GroupA/Scanned/Ivanova.pdf",
          "users" : [ ],
          "content" : "Ivanova document"
		  "tags" : [ ],
          "attachment" : {
            "date" : "2022-04-06T06:29:11Z",
            "content_type" : "application/pdf",
            "format" : "application/pdf; version=\"A-2b\"",
            "modified" : "2022-04-06T06:29:11Z",
            "language" : "uk",
            "creator_tool" : "ABBYY FineReader 14",
            "content_length" : 1612
          },
          "provider" : "files",
          "subtags" : [ ],
          "parts" : {
            "comments" : ""
          },
          "links" : [ ],
          "share_names" : {
            "user0" : "GroupA/Scanned/Ivanov.pdf",
            "user1" : "Scanned/Ivanov.pdf",
            "user2" : "GroupA/Scanned/Ivanov.pdf",
            "user3" : ""
          },
          "hash" : "aeca335860b2f59954q5e7fd34b174a1"
        }
      },
      {
        "_index" : "my-index",
        "_id" : "files:6812",
        "_score" : 9.4341955,
        "_ignored" : [
          "content.keyword"
        ],
        "_source" : {
          "owner" : "user0",
          "groups" : [
            "GroupA",
            "GroupB",
            "GroupC",
            "GroupD"
          ],
          "circles" : [ ],
          "metatags" : [
            "files_group_folders"
          ],
          "source" : "files_group_folders",
          "title" : "GroupA/Scanned/Ivanova2.pdf",
          "users" : [ ],
          "content" : "Ivanova........................................................... ...................",
          "tags" : [ ],
          "attachment" : {
            "date" : "2022-04-01T08:00:50Z",
            "content_type" : "application/pdf",
            "format" : "application/pdf; version=\"A-2b\"",
            "modified" : "2022-04-01T08:00:50Z",
            "language" : "uk",
            "creator_tool" : "ABBYY FineReader 14",
            "content_length" : 1260
          },
          "provider" : "files",
          "subtags" : [ ],
          "parts" : {
            "comments" : ""
          },
          "links" : [ ],
          "share_names" : {
            "user0" : "GroupA/Scanned/Ivanova2.pdf",
            "user1" : "Scanned/Ivanova2.pdf",
            "user2" : "GroupA/Scanned/Ivanova2.pdf",
            "user3" : "",
          },
          "hash" : "e09e889376ebe62b907b8023f37d21a9"
        }
      },
      {
        "_index" : "my-index",
        "_id" : "files:1576",
        "_score" : 9.30954,
        "_source" : {
          "owner" : "user0",
          "groups" : [
            "GroupA",
            "GroupB",
            "GroupC",
            "GroupD"
          ],
          "circles" : [ ],
          "metatags" : [
            "files_group_folders"
          ],
          "source" : "files_group_folders",
          "title" : "GroupA/Scanned/Ivaniva CV.pdf",
          "users" : [ ],
          "content" : "",
          "tags" : [ ],
          "attachment" : {
            "date" : "2022-06-02T07:57:28Z",
            "keywords" : "Scanned image",
            "content_type" : "application/pdf",
            "author" : "NAPS2",
            "format" : "application/pdf; version=1.4",
            "modified" : "2022-06-02T07:57:28Z",
            "language" : "lt",
            "title" : "Scanned image",
            "creator_tool" : "NAPS2",
            "content_length" : 4
          },
          "provider" : "files",
          "subtags" : [ ],
          "parts" : {
            "comments" : ""
          },
          "links" : [ ],
          "share_names" : {
            "user0" : "GroupA/Scanned/Ivanova CV.pdf",
            "user5" : "GroupA/Scanned/Ivanova CV.pdf",
            "user3" : ""
          },
          "hash" : "53f4a218a7ac0f31648efc0834c35199"
        }
      },
      {
        "_index" : "my-index",
        "_id" : "files:518673",
        "_score" : 8.218001,
        "_ignored" : [
          "content.keyword"
        ],
        "_source" : {
          "owner" : "user0",
          "groups" : [
            "GroupD"
          ],
          "circles" : [ ],
          "metatags" : [
            "files_group_folders"
          ],
          "source" : "files_group_folders",
          "title" : "GroupA/Scanned/Ivanova Bulletin.docx",
          "users" : [
            "user8"
          ],
          "content" : "Ivanova Bulletin text text",
          "tags" : [ ],
          "attachment" : {
            "date" : "2023-06-13T09:19:00Z",
            "content_type" : "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
            "author" : "Administrator",
            "modifier" : "User6",
            "modified" : "2023-06-14T13:32:00Z",
            "language" : "uk",
            "content_length" : 1701,
            "print_date" : "2023-06-14T13:30:00Z"
          },
          "provider" : "files",
          "subtags" : [ ],
          "parts" : {
            "comments" : ""
          },
          "links" : [ ],
          "share_names" : {
            "user0" : "GroupA/Bulls/Ivanova Bull.docx",
            "user9" : "GroupA/Bulls/Ivanova Bull.docx"
          },
          "hash" : "99a498dbb1db6f68a6be3793e30a9476"
        }
      },
      {
        "_index" : "my-index",
        "_id" : "files:6806",
        "_score" : 7.622202,
        "_source" : {
          "share_names" : {
            "user0" : "GroupA/Scanned/Photos/Ivanova",
            "user2" : "Scanned/Photos/Ivanova",
            "user3" : ""
          },
          "owner" : "user0",
          "users" : [ ],
          "groups" : [
            "GroupA",
            "GroupB",
            "GroupC",
            "GroupD"
          ],
          "circles" : [ ],
          "links" : [ ],
          "metatags" : [
            "files_group_folders"
          ],
          "subtags" : [ ],
          "tags" : [ ],
          "hash" : "",
          "provider" : "files",
          "source" : "files_group_folders",
          "title" : "GroupA/Scanned/Ivanova",
          "parts" : [ ],
          "content" : ""
        }
      },
      {
        "_index" : "my-index",
        "_id" : "files:466569",
        "_score" : 5.747198,
        "_ignored" : [
          "content.keyword"
        ],
        "_source" : {
          "owner" : "user0",
          "groups" : [
            "GroupD"
          ],
          "circles" : [ ],
          "metatags" : [
            "files_group_folders"
          ],
          "source" : "files_group_folders",
          "title" : "GroupA/Applications/Ivanova.docx",
          "users" : [
            "user4",
            "user5",
            "user6",
            "user7"
          ],
          "content" : "Application of Ivanova text text text",
          "tags" : [ ],
          "attachment" : {
            "date" : "2023-03-10T12:25:00Z",
            "content_type" : "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
            "author" : "user5",
            "modifier" : "user10",
            "modified" : "2023-05-17T17:55:09Z",
            "language" : "uk",
            "content_length" : 1550
          },
          "provider" : "files",
          "subtags" : [ ],
          "parts" : {
            "comments" : ""
          },
          "links" : [ ],
          "share_names" : {
            "user0" : "GroupA/Applications/Ivanova.docx",
            "user6" : "GroupA/Applications/Ivanova.docx",
            "user3" : "",
          },
          "hash" : "f8c12747f9b240ed696385d2aff7f0fe"
        }
      }
    ]
  }
}

Next, I search for Ivanova on behind of a user0

$ sudo -u www-data php /var/www/nextcloud/occ fulltextsearch:search user0 Ivanova
search
> Files
 - 518673 score:0
 - 480179 score:0
 - 514585 score:0
 - 527182 score:0
 - 531276 score:0
 - 363692 score:0

The documents are stored in the groupfolder, which is accessible to user0, user1, user2, user3, user10, but that's what I see when searching with the help of NC fulltextsearch app:

$ sudo -u www-data php /var/www/nextcloud/occ fulltextsearch:search user10 Ivanova
search
> Files

$ sudo -u www-data php /var/www/nextcloud/occ fulltextsearch:search user3 Ivanova
search
> Files
 - 97091 score:0

Obviously, users in webUI cannot see the proper output.

Would you be so kind to explain, why it is not working as expected? What should I do to make it working? Thanks!

oleua avatar Jul 02 '23 10:07 oleua

To be able to debug the ES queries generated by this app, I'd recommend to set your loglevel to 0 in your config.php. The app should then log the query which is sent to the ES server before actually sending it (you should see some log message like Searching ES ... after your sudo -u www-data php /var/www/nextcloud/occ fulltextsearch:search ... command).

Unfortunately the debug logging is currently broken, that's why you might need to adjust the following code a bit to be able to actually see the query body:

try {
    $serializedParams = var_export($query['params'], true);
    $this->logger->debug('Searching ES: ' . $serializedParams);
    $result = $client->search($query['params']);
}

(Will try to bring in a PR to fix the debug logging within the next days).

After that, analyze the JSON query which is sent to ES and now written to your NC log. Most likely it will contain a filter section which tries to ensure that users do not see results for files where they don't have permissions. Try to figure out which of these filters are filtering out your expected results. I'd guess that the share_names array is the culprit:

"share_names" : {
            "user0" : "GroupA/Applications/Ivanova.docx",
            "user6" : "GroupA/Applications/Ivanova.docx",
            "user3" : "",
          },

A ES document is only visible to his owner or in your case also to anyone listed in the share_names array. Think your documents share_name entries are just missing most of your users (I'd expect users 1-10 to be listed here ...).

That would be a hint that your index process ended with errors. In that case please reindex your documents end inspect your Nextcloud logfile while leaving the loglevel set to 0.

R0Wi avatar Jul 03 '23 08:07 R0Wi

Out of curiosity: https://github.com/nextcloud/fulltextsearch_elasticsearch#compatibility says the app is ONLY compatible with ES 7? OP has used a container with ES 8.6.1 -- please somebody clarify if this CAN even work?

it25fg avatar Jul 22 '23 19:07 it25fg

Docs are outdated. If you have a look at the composer dependencies, you'll see that since app version 26 the ES client 8.6.1 is used. So app version >=26 is ONLY compatible with ES server 8.6.x.

R0Wi avatar Jul 23 '23 13:07 R0Wi

Docs are outdated. If you have a look at the composer dependencies, you'll see that since app version 26 the ES client 8.6.1 is used. So app version >=26 is ONLY compatible with ES server 8.6.x.

Thanks for the clarification. As always, the real information is buried in the sources. Wouldn't it be a nice gesture to the admin who wants to install this app for his users: let him know upfront that this app is compatible with a distinct ElasticSearch version? (in Nextcloud admin panel -> apps -> fulltextsearch_elasticsearch -> details)?

it25fg avatar Jul 23 '23 14:07 it25fg

Now I'm fully on track: NC on 26.0.4, ES on 8.6.1. And it seems the same result as described here: everything is fully indexed (index rebuilt with zero errors), but the query for files does not yield results. In particular:

  • I can query manually /indexname/_query?q=a_search_term and I get the expected entries. I can verify that all the infos around the document (owner, shares, groups etc.) are there.
  • A query done by occ fulltextsearch:query does not show this result, even if the querying user is the owner of the document, or verifiable in the 'users' array as well as in the 'share_names' dictionary. Other content providers don't seem to be affected (I have deck, this returns expected results).

Shall I open a new issue for this (the difference is: there are no groupfolders involved) or which info do you need to track this down?

it25fg avatar Jul 26 '23 20:07 it25fg

Now I'm fully on track: NC on 26.0.4, ES on 8.6.1. And it seems the same result as described here: everything is fully indexed (index rebuilt with zero errors), but the query for files does not yield results. In particular:

  • I can query manually /indexname/_query?q=a_search_term and I get the expected entries. I can verify that all the infos around the document (owner, shares, groups etc.) are there.
  • A query done by occ fulltextsearch:query does not show this result, even if the querying user is the owner of the document, or verifiable in the 'users' array as well as in the 'share_names' dictionary. Other content providers don't seem to be affected (I have deck, this returns expected results).

Shall I open a new issue for this (the difference is: there are no groupfolders involved) or which info do you need to track this down?

Reading your observations, I understand that the issue is hiding somewhere deeper. After the initial indexing, which has brought ambiguous results the elasticsearch+nexrtcloud starts working fine, so, when I modify or delete or add some document the cron does it work and the documents are indexing properly independently of either local or group folders.

oleua avatar Jul 27 '23 05:07 oleua

Now I'm fully on track: NC on 26.0.4, ES on 8.6.1. And it seems the same result as described here: everything is fully indexed (index rebuilt with zero errors), but the query for files does not yield results. In particular:

  • I can query manually /indexname/_query?q=a_search_term and I get the expected entries. I can verify that all the infos around the document (owner, shares, groups etc.) are there.
  • A query done by occ fulltextsearch:query does not show this result, even if the querying user is the owner of the document, or verifiable in the 'users' array as well as in the 'share_names' dictionary. Other content providers don't seem to be affected (I have deck, this returns expected results).

Shall I open a new issue for this (the difference is: there are no groupfolders involved) or which info do you need to track this down?

I have decided to open #277 for this problem. It seems too different from this issue which is about indexing.

it25fg avatar Jul 31 '23 11:07 it25fg