server icon indicating copy to clipboard operation
server copied to clipboard

[Bug]: Unified Query Allows Unwieldy SQL Statement With External Storages

Open daveatpinellas opened this issue 1 year ago • 6 comments

⚠️ This issue respects the following points: ⚠️

  • [X] This is a bug, not a question or a configuration/webserver/proxy issue.
  • [X] This issue is not already reported on Github (I've searched it).
  • [X] Nextcloud Server is up to date. See Maintenance and Release Schedule for supported versions.
  • [X] Nextcloud Server is running on 64bit capable CPU, PHP and OS.
  • [X] I agree to follow Nextcloud's Code of Conduct.

Bug description

If you have Nextcloud connected to multiple external storage devices, when you use the unified search feature it's generating a SQL statement with lots of OR statements and takes too long to complete. The CPU is pegged during this query and it continues even if you click focus out of the UI or navigate to another part of Nextcloud.

In some way probably there should be a throttle on allowing this many external storage connections in the SQL. Possibly it should break this into smaller pieces and run them in a loop to return the data.

Here is the statement that was generated, and this is taking in some cases 10 minutes to complete:

SELECT file.fileid, storage, path, path_hash, file.parent, file.name, mimetype, mimepart, size, mtime, storage_mtime, encrypted, etag, permissions, checksum, unencrypted_size FROM oc_filecache file LEFT JOIN oc_vcategory_to_object tagmap ON file.fileid = tagmap.objid LEFT JOIN oc_systemtag_object_mapping systemtagmap ON (file.fileid = systemtagmap.objectid) AND (systemtagmap.objecttype = 'files') LEFT JOIN oc_vcategory tag ON (tagmap.type = tag.type) AND (tagmap.categoryid = tag.id) AND (tag.type = 'files') AND (tag.uid = '#############') LEFT JOIN oc_systemtag systemtag ON (systemtag.id = systemtagmap.systemtagid) AND (systemtag.visibility = '1') WHERE ((tag.category  COLLATE utf8mb4_general_ci LIKE '%smith%') OR (systemtag.name  COLLATE utf8mb4_general_ci LIKE '%smith%')) AND (((storage = 12) AND ((path = 'files') OR (path LIKE 'files/%'))) OR (storage = 44) OR (storage = 47) OR (storage = 48) OR (storage = 50) OR (storage = 51) OR (storage = 54) OR (storage = 55) OR (storage = 62) OR (storage = 161) OR (storage = 205) OR (storage = 236) OR (storage = 236) OR (storage = 257) OR (storage = 294) OR (storage = 294) OR (storage = 341) OR (storage = 361)) ORDER BY mtime + '0' desc LIMIT 5

Steps to reproduce

  1. Log into an account with lots of external storage connections
  2. Click into the unified search box and enter a search
  3. Using top, watch the huge load on the server

Expected behavior

In some way, the sql statement should have a level of reasonableness that prohibits this many external storage folders to be part of the query. Or send them one after the other in a loop.

Installation method

None

Nextcloud Server version

26

Operating system

None

PHP engine version

None

Web server

None

Database engine version

None

Is this bug present after an update or on a fresh install?

None

Are you using the Nextcloud Server Encryption module?

None

What user-backends are you using?

  • [ ] Default user-backend (database)
  • [ ] LDAP/ Active Directory
  • [ ] SSO - SAML
  • [ ] Other

Configuration report

No response

List of activated Apps

Enabled:
  - activity: 2.18.0
  - bruteforcesettings: 2.6.0
  - circles: 26.0.0
  - cloud_federation_api: 1.9.0
  - comments: 1.16.0
  - dav: 1.25.0
  - federatedfilesharing: 1.16.0
  - federation: 1.16.0
  - files: 1.21.1
  - files_accesscontrol: 1.16.0
  - files_automatedtagging: 1.16.1
  - files_external: 1.18.0
  - files_retention: 1.15.0
  - files_rightclick: 1.5.0
  - files_sharing: 1.18.0
  - files_trashbin: 1.16.0
  - files_versions: 1.19.1
  - impersonate: 1.13.1
  - logreader: 2.11.0
  - lookup_server_connector: 1.14.0
  - oauth2: 1.14.0
  - photos: 2.2.0
  - privacy: 1.10.0
  - provisioning_api: 1.16.0
  - related_resources: 1.1.0-alpha1
  - serverinfo: 1.16.0
  - settings: 1.8.0
  - sharebymail: 1.16.0
  - support: 1.9.0
  - systemtags: 1.16.0
  - theming: 2.1.1
  - twofactor_backupcodes: 1.15.0
  - user_ldap: 1.16.0
  - viewer: 1.10.0
  - workflowengine: 2.8.0

Nextcloud Signing status

No response

Nextcloud Logs

No response

Additional info

No response

daveatpinellas avatar Jun 05 '23 19:06 daveatpinellas

Cc @icewind1991

szaimen avatar Jun 05 '23 21:06 szaimen

Similar to https://github.com/nextcloud/server/issues/35776. For each shared file, another OR statement is required and search really gets slow. Maybe also realated: https://github.com/nextcloud/server/pull/37061

XueSheng-GIT avatar Jun 06 '23 10:06 XueSheng-GIT

Do you see a better execution time when you combine the OR (storage = x) query parts to OR (storage IN (x, y, z)) ?

kesselb avatar Jun 07 '23 08:06 kesselb

@kesselb How could that be done? The search query is generated by nextcloud and includes (in my case) around thousand storages.

XueSheng-GIT avatar Jun 07 '23 09:06 XueSheng-GIT

@XueSheng-GIT you take one query from the slow query log, modify it manually and execute it ;) If we see a improvement we can think about a patch.

kesselb avatar Jun 07 '23 10:06 kesselb

The phenomenon that the CPU utilization rises when searching with the unified search feature has also been confirmed in version 25.0.6.

cra-taketomi-yo avatar Jun 22 '23 08:06 cra-taketomi-yo

@kesselb maybe you want to have a second look at https://github.com/nextcloud/server/pull/37061 again. Search execution time did improve tremendously.

XueSheng-GIT avatar Jul 09 '23 06:07 XueSheng-GIT

Can you test whether this helps?

https://github.com/nextcloud/server/issues/38243#issuecomment-1552714039

pjft avatar Jul 21 '23 11:07 pjft