server
server copied to clipboard
Large number of DB queries during execution of a single API call
When diagnosing slow loading of chat messages in Nextcloud Talk application I have noticed that in my environment (NC 24.0.3, MySQL database), during execution of a single API call (retrieve chat messages), NC sends over 2000 DB queries.
I use the following call (same as the Nextcloud Talk Android app):
https://<server>/ocs/v2.php/apps/spreed/api/v1/chat/<room-token>?lastKnownMessageId=<msgid>&includeLastKnown=1&limit=100&setReadMarker=1&timeout=0&lookIntoFuture=0
Most of the queries (over 2100 from about 2500) are:
INSERT INTO `oc_mounts` (`storage_id`,`root_id`,`user_id`,`mount_point`,`mount_id`,`mount_provider_class`) SELECT ?,?,?,?,?,? FROM `oc_mounts` WHERE `root_id` = ? AND `user_id` = ? HAVING COUNT(*) = 0
There are about 1300 records in the x_mounts table. Majority of them (about 1100) have NULL in the mount_provider_class column.
This causes code in the registerMounts method of the UserMountCache class to ignore the existing cache entries and try to re-add them - see addToCache call.
However, the correct records are not added to the x_mounts table, since records with the same root_id and user_id already exist (the ones with NULL in the mount_provider_class column). And the same situation occurs on each subsequent API call.
As I understand the mount_provider_class column has been added recently. However, it seems that all the records that existed earlier in the x_mounts table, get NULL in the new column after upgrade. This in turn leads to the observed behaviour.
I am not sure how the issue should be fixed.
- Is it safe to delete all records from the x_mounts table? Will they be recreated properly by the registerMounts code?
- If yes to above - should the upgrade delete the records after adding the column?
Moving back to server. @szaimen this is outside of Talk's stack.
This should also all be improved since https://github.com/nextcloud/server/pull/31265 But as you run 24.0.3 there seems to still be an issue somewhere
@szaimen this is outside of Talk's stack.
all right! sorry for that. Looked like a specific issue with the talk app to me
I suspect that the oc_mounts
table in my installation contains invalid entries after upgrading to 24. See below for a sample:
- Entries for files that have been shared in Talk before upgrading to 24 have
NULL
instead ofOCA\Files_Sharing\MountProvider
in themount_provider_class
column. - A few entries at the bottom are related to files shared after upgrade to 24, and these seem OK.
I went ahead and deleted all records from the oc_mounts
table with NULL
in the mount_provider_class
column. Required records have been recreated automatically (as expected) and now all the records in the oc_mounts
table have reasonable values in the mount_provider_class
column.
This helped significantly. :wink:
Now only two INSERT INTO...
statements (down from over 2100) are executed and the total execution time of the API call was reduced from about 2.8 s to 1.4 s.
My preliminary conclusions are:
- The upgrade that adds the
mount_provider_class
should probably delete all records from theoc_mounts
table. - Two remaining
INSERT INTO...
statements require further investigation.
Any comments/suggestions?
I have investigated the remaining INSERT INTO...
statements. Turns out that the number of statement varies - between 1 and 5 (5 is the number of participants of the chat).
The code tries to update the entries for the home mounts - i.e. the ones with OC\Files\Mount\LocalHomeMountProvider
in the mount_provider_class
column.
Apparently it always tries to update the entry for the current user - so there is at least one INSERT INTO...
statement. The other entries are sometimes updated - seemingly depending on timing.
I have tried to debug the UserMountCache::registerMounts
method to find the root cause. Unfortunately this requires much deeper knowledge of the file system setup process than I currently have. :disappointed:
I hope that information above would help someone else to find/fix the issue.
I have been able to repeat the issue on a fresh Nextcloud installation (using Docker) in the following scenario.
- Create a new Nextcloud instance using version 23.0.7.
- Install Talk.
- Create an additional test account.
- Upload a few images from the test account and share them in a Talk chat with the admin user.
- Upgrade the instance to 24.0.3.
After these steps I got the following contents in the oc_mounts
table:
MariaDB [nextcloud]> select * from oc_mounts;
+----+------------+---------+---------+------------------------------------------------+----------+----------------------+
| id | storage_id | root_id | user_id | mount_point | mount_id | mount_provider_class |
+----+------------+---------+---------+------------------------------------------------+----------+----------------------+
| 1 | 1 | 1 | admin | /admin/ | NULL | NULL |
| 2 | 3 | 125 | test01 | /test01/ | NULL | NULL |
| 3 | 4 | 131 | test12 | /test12/ | NULL | NULL |
| 4 | 3 | 381 | admin | /admin/files/Talk/009-IMG_20170828_112601.jpg/ | NULL | NULL |
| 5 | 3 | 357 | admin | /admin/files/Talk/006-IMG_20170827_124444.jpg/ | NULL | NULL |
| 6 | 3 | 372 | admin | /admin/files/Talk/008-IMG_20170827_194823.jpg/ | NULL | NULL |
| 7 | 3 | 511 | admin | /admin/files/Talk/037-DSC05971.jpg/ | NULL | NULL |
+----+------------+---------+---------+------------------------------------------------+----------+----------------------+
After deleting all records from the oc_mounts
table these have been recreated automatically as expected on a next request (with correct values in the mount_provider_class
column).
MariaDB [nextcloud]> select * from oc_mounts;
+----+------------+---------+---------+------------------------------------------------+----------+---------------------------------------+
| id | storage_id | root_id | user_id | mount_point | mount_id | mount_provider_class |
+----+------------+---------+---------+------------------------------------------------+----------+---------------------------------------+
| 8 | 1 | 1 | admin | /admin/ | NULL | OC\Files\Mount\LocalHomeMountProvider |
| 9 | 3 | 381 | admin | /admin/files/Talk/009-IMG_20170828_112601.jpg/ | NULL | OCA\Files_Sharing\MountProvider |
| 10 | 3 | 357 | admin | /admin/files/Talk/006-IMG_20170827_124444.jpg/ | NULL | OCA\Files_Sharing\MountProvider |
| 11 | 3 | 372 | admin | /admin/files/Talk/008-IMG_20170827_194823.jpg/ | NULL | OCA\Files_Sharing\MountProvider |
| 12 | 3 | 511 | admin | /admin/files/Talk/037-DSC05971.jpg/ | NULL | OCA\Files_Sharing\MountProvider |
| 13 | 3 | 125 | test01 | /test01/ | NULL | OC\Files\Mount\LocalHomeMountProvider |
+----+------------+---------+---------+------------------------------------------------+----------+---------------------------------------+
Apparently potential gains from #30978 cannot be fully realized due to this issue. :disappointed:
@icewind1991
https://github.com/nextcloud/server/pull/33540 should fix pre-24 mountpoints not having their mount_provider
updated.
Thanks @icewind1991! I have checked #33540 in my dev environment - works as expected. :smile:
As #33540 is now merged, I have moved the issue to Done in the Performance project and closing it now,
@starypatyk I found this issue after searching and debugging a bunch because I...
- experienced seemingly random slow HTTP responses
- watched the mariadb processlist closely and
- noticed tons of inserts into
oc_mounts
I have 7474 rows in my oc_mounts
table and 1,708 of these rows have NULL values for mount_provider_class
. I'm running the Nextcloud v24.0.6-apache Docker image.
- Should I go ahead and
DELETE FROM 'oc_mounts'
? Or is there some perhaps safer / higher-level way to fix this, maybeocc scan
or some button I should click in the web UI instead? - Does anyone want more info for debugging? I'm happy to wait to do this brute-force fix if it would be helpful.
@icewind1991 is it safe to clear the oc_mounts table in NC 24 ?
I updated to v24.0.7 I did not delete any rows in oc_mounts
. Everything feels faster. I don't seem to be having any more slow HTTP responses. :tada:
select count(*) from oc_mounts where mount_provider_class IS NULL
now returns 30.
Related thread: https://help.nextcloud.com/t/rogue-orphaned-mount-points-in-oc-mounts/118920/4
as far as I remember, the ones with "NULL" are supposed to auto-populate during access
@meonkeys - Indeed the fix mentioned here (#33540) has been backported to 24 (#34797) and released in 24.0.7.
Right on, thanks.
- Is it normal to have 7k rows in
oc_mounts
? - If not, is safe/recommended to truncate rows?
I'm asking because I am still noticing some slow responses. I've also been watching the processlist in MariaDB and I see lots of DML against the oc_mounts
table during these slow responses. Mostly INSERTs, DELETEs. Maybe some UPDATEs, too. Is this normal? Seems like a lot of churn on that table. I don't see a clear pattern as to which requests trigger this churn.
The count of rows in my oc_mounts
dips below 7k for a couple seconds, then goes back up just above 7k. It stops churning after about 10 seconds.