server Large number of DB queries during execution of a single API call

Large number of DB queries during execution of a single API call

Open starypatyk opened this issue 2 years ago • 7 comments

When diagnosing slow loading of chat messages in Nextcloud Talk application I have noticed that in my environment (NC 24.0.3, MySQL database), during execution of a single API call (retrieve chat messages), NC sends over 2000 DB queries.

I use the following call (same as the Nextcloud Talk Android app):

https://<server>/ocs/v2.php/apps/spreed/api/v1/chat/<room-token>?lastKnownMessageId=<msgid>&includeLastKnown=1&limit=100&setReadMarker=1&timeout=0&lookIntoFuture=0

Most of the queries (over 2100 from about 2500) are:

INSERT INTO `oc_mounts` (`storage_id`,`root_id`,`user_id`,`mount_point`,`mount_id`,`mount_provider_class`) SELECT ?,?,?,?,?,? FROM `oc_mounts` WHERE `root_id` = ? AND `user_id` = ? HAVING COUNT(*) = 0

There are about 1300 records in the x_mounts table. Majority of them (about 1100) have NULL in the mount_provider_class column.

This causes code in the registerMounts method of the UserMountCache class to ignore the existing cache entries and try to re-add them - see addToCache call.

However, the correct records are not added to the x_mounts table, since records with the same root_id and user_id already exist (the ones with NULL in the mount_provider_class column). And the same situation occurs on each subsequent API call.

As I understand the mount_provider_class column has been added recently. However, it seems that all the records that existed earlier in the x_mounts table, get NULL in the new column after upgrade. This in turn leads to the observed behaviour.

I am not sure how the issue should be fixed.

Is it safe to delete all records from the x_mounts table? Will they be recreated properly by the registerMounts code?
If yes to above - should the upgrade delete the records after adding the column?

Jul 18 '22 19:07 starypatyk

Moving back to server. @szaimen this is outside of Talk's stack.

Jul 20 '22 10:07 nickvergessen

This should also all be improved since https://github.com/nextcloud/server/pull/31265 But as you run 24.0.3 there seems to still be an issue somewhere

Jul 20 '22 10:07 nickvergessen

@szaimen this is outside of Talk's stack.

all right! sorry for that. Looked like a specific issue with the talk app to me

Jul 20 '22 15:07 szaimen

I suspect that the oc_mounts table in my installation contains invalid entries after upgrading to 24. See below for a sample:

obraz

Entries for files that have been shared in Talk before upgrading to 24 have NULL instead of OCA\Files_Sharing\MountProvider in the mount_provider_class column.
A few entries at the bottom are related to files shared after upgrade to 24, and these seem OK.

Jul 20 '22 21:07 starypatyk

I went ahead and deleted all records from the oc_mounts table with NULL in the mount_provider_class column. Required records have been recreated automatically (as expected) and now all the records in the oc_mounts table have reasonable values in the mount_provider_class column.

This helped significantly. :wink:

Now only two INSERT INTO... statements (down from over 2100) are executed and the total execution time of the API call was reduced from about 2.8 s to 1.4 s.

My preliminary conclusions are:

The upgrade that adds the mount_provider_class should probably delete all records from the oc_mounts table.
Two remaining INSERT INTO... statements require further investigation.

Any comments/suggestions?

Jul 24 '22 13:07 starypatyk

I have investigated the remaining INSERT INTO... statements. Turns out that the number of statement varies - between 1 and 5 (5 is the number of participants of the chat).

The code tries to update the entries for the home mounts - i.e. the ones with OC\Files\Mount\LocalHomeMountProvider in the mount_provider_class column.

obraz

Apparently it always tries to update the entry for the current user - so there is at least one INSERT INTO... statement. The other entries are sometimes updated - seemingly depending on timing.

I have tried to debug the UserMountCache::registerMounts method to find the root cause. Unfortunately this requires much deeper knowledge of the file system setup process than I currently have. :disappointed:

I hope that information above would help someone else to find/fix the issue.

Jul 24 '22 22:07 starypatyk

I have been able to repeat the issue on a fresh Nextcloud installation (using Docker) in the following scenario.

Create a new Nextcloud instance using version 23.0.7.
Install Talk.
Create an additional test account.
Upload a few images from the test account and share them in a Talk chat with the admin user.
Upgrade the instance to 24.0.3.

After these steps I got the following contents in the oc_mounts table:

MariaDB [nextcloud]> select * from oc_mounts;
+----+------------+---------+---------+------------------------------------------------+----------+----------------------+
| id | storage_id | root_id | user_id | mount_point                                    | mount_id | mount_provider_class |
+----+------------+---------+---------+------------------------------------------------+----------+----------------------+
|  1 |          1 |       1 | admin   | /admin/                                        |     NULL | NULL                 |
|  2 |          3 |     125 | test01  | /test01/                                       |     NULL | NULL                 |
|  3 |          4 |     131 | test12  | /test12/                                       |     NULL | NULL                 |
|  4 |          3 |     381 | admin   | /admin/files/Talk/009-IMG_20170828_112601.jpg/ |     NULL | NULL                 |
|  5 |          3 |     357 | admin   | /admin/files/Talk/006-IMG_20170827_124444.jpg/ |     NULL | NULL                 |
|  6 |          3 |     372 | admin   | /admin/files/Talk/008-IMG_20170827_194823.jpg/ |     NULL | NULL                 |
|  7 |          3 |     511 | admin   | /admin/files/Talk/037-DSC05971.jpg/            |     NULL | NULL                 |
+----+------------+---------+---------+------------------------------------------------+----------+----------------------+

After deleting all records from the oc_mounts table these have been recreated automatically as expected on a next request (with correct values in the mount_provider_class column).

MariaDB [nextcloud]> select * from oc_mounts;
+----+------------+---------+---------+------------------------------------------------+----------+---------------------------------------+
| id | storage_id | root_id | user_id | mount_point                                    | mount_id | mount_provider_class                  |
+----+------------+---------+---------+------------------------------------------------+----------+---------------------------------------+
|  8 |          1 |       1 | admin   | /admin/                                        |     NULL | OC\Files\Mount\LocalHomeMountProvider |
|  9 |          3 |     381 | admin   | /admin/files/Talk/009-IMG_20170828_112601.jpg/ |     NULL | OCA\Files_Sharing\MountProvider       |
| 10 |          3 |     357 | admin   | /admin/files/Talk/006-IMG_20170827_124444.jpg/ |     NULL | OCA\Files_Sharing\MountProvider       |
| 11 |          3 |     372 | admin   | /admin/files/Talk/008-IMG_20170827_194823.jpg/ |     NULL | OCA\Files_Sharing\MountProvider       |
| 12 |          3 |     511 | admin   | /admin/files/Talk/037-DSC05971.jpg/            |     NULL | OCA\Files_Sharing\MountProvider       |
| 13 |          3 |     125 | test01  | /test01/                                       |     NULL | OC\Files\Mount\LocalHomeMountProvider |
+----+------------+---------+---------+------------------------------------------------+----------+---------------------------------------+

Apparently potential gains from #30978 cannot be fully realized due to this issue. :disappointed:

Aug 09 '22 21:08 starypatyk

@icewind1991

Aug 15 '22 09:08 PVince81

https://github.com/nextcloud/server/pull/33540 should fix pre-24 mountpoints not having their mount_provider updated.

Aug 15 '22 09:08 icewind1991

Thanks @icewind1991! I have checked #33540 in my dev environment - works as expected. :smile:

Aug 15 '22 12:08 starypatyk

As #33540 is now merged, I have moved the issue to Done in the Performance project and closing it now,

Aug 27 '22 21:08 starypatyk

@starypatyk I found this issue after searching and debugging a bunch because I...

experienced seemingly random slow HTTP responses
watched the mariadb processlist closely and
noticed tons of inserts into oc_mounts

I have 7474 rows in my oc_mounts table and 1,708 of these rows have NULL values for mount_provider_class. I'm running the Nextcloud v24.0.6-apache Docker image.

Should I go ahead and DELETE FROM 'oc_mounts'? Or is there some perhaps safer / higher-level way to fix this, maybe occ scan or some button I should click in the web UI instead?
Does anyone want more info for debugging? I'm happy to wait to do this brute-force fix if it would be helpful.

Nov 13 '22 23:11 meonkeys

@icewind1991 is it safe to clear the oc_mounts table in NC 24 ?

Nov 14 '22 15:11 PVince81

I updated to v24.0.7 I did not delete any rows in oc_mounts. Everything feels faster. I don't seem to be having any more slow HTTP responses. :tada:

select count(*) from oc_mounts where mount_provider_class IS NULL now returns 30.

Related thread: https://help.nextcloud.com/t/rogue-orphaned-mount-points-in-oc-mounts/118920/4

Nov 14 '22 16:11 meonkeys

as far as I remember, the ones with "NULL" are supposed to auto-populate during access

Nov 14 '22 16:11 PVince81

@meonkeys - Indeed the fix mentioned here (#33540) has been backported to 24 (#34797) and released in 24.0.7.

Nov 14 '22 19:11 starypatyk

Right on, thanks.

Is it normal to have 7k rows in oc_mounts?
If not, is safe/recommended to truncate rows?

I'm asking because I am still noticing some slow responses. I've also been watching the processlist in MariaDB and I see lots of DML against the oc_mounts table during these slow responses. Mostly INSERTs, DELETEs. Maybe some UPDATEs, too. Is this normal? Seems like a lot of churn on that table. I don't see a clear pattern as to which requests trigger this churn.

The count of rows in my oc_mounts dips below 7k for a couple seconds, then goes back up just above 7k. It stops churning after about 10 seconds.

Nov 15 '22 21:11 meonkeys

server server copied to clipboard

Large number of DB queries during execution of a single API call

server
server copied to clipboard