selfoss
selfoss copied to clipboard
Slow MySQL queries when reading new items
My SELFOSS database has ~500 sources and ~330k items, the oldest from Jan 2016.
Love Selfoss thanks a lot for it !
Database Version : mariadb 10.3.27
PHP Version : PHP7.0 (yes, I know)
OS : Debian 11 Bullseye
My problem: SELFOSS keeps slowing down when loading items.
I have activated the MySQL Log, and the query executed on each items load is responsible.
Here is the main query :
SELECT
items.id, datetime, items.title AS title, content, unread, starred, source, thumbnail, icon, uid, link, updatetime, author, sources.title as sourcetitle, sources.tags as tagsitems.id, datetime, items.title AS title, content, unread, starred, source, thumbnail, icon, uid, link, updatetime, author, sources.title as sourcetitle, sources.tags as tags
FROM items AS items, sources AS sources
WHERE items.source=sources.id AND TRUE AND unread=1
ORDER BY items.datetime DESC, items.id DESC
LIMIT 50 OFFSET 0;
- - -
50 rows in set (6.440 sec)
Pretty slow.
Running the EXPLAIN plan for that query tells us more:
EXPLAIN SELECT
items.id, datetime, items.title AS title, content, unread, starred, source, thumbnail, icon, uid, link, updatetime, author, sources.title as sourcetitle, sources.tags as tagsitems.id, datetime, items.title AS title, content, unread, starred, source, thumbnail, icon, uid, link, updatetime, author, sources.title as sourcetitle, sources.tags as tags
FROM items AS items, sources AS sources
WHERE items.source=sources.id AND TRUE AND unread=1
ORDER BY items.datetime DESC, items.id DESC
LIMIT 50 OFFSET 0;
+------+-------------+---------+------+---------------+--------+---------+--------------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+---------+------+---------------+--------+---------+--------------------------+------+---------------------------------+
| 1 | SIMPLE | sources | ALL | PRIMARY | NULL | NULL | NULL | 489 | Using temporary; Using filesort |
| 1 | SIMPLE | items | ref | source | source | 4 | alban_selfoss.sources.id | 349 | Using where |
+------+-------------+---------+------+---------------+--------+---------+--------------------------+------+---------------------------------+
The query shows use of temporary table and filesort, which is slow.
Let's see how ORDER BY clauses are responsible for that by removing them:
SELECT
items.id, datetime, items.title AS title, content, unread, starred, source, thumbnail, icon, uid, link, updatetime, author, sources.title as sourcetitle, sources.tags as tags
FROM items AS items, sources AS sources
WHERE items.source=sources.id AND TRUE AND unread=1
LIMIT 50 OFFSET 0;
- - -
50 rows in set (0.001 sec)
EXPLAIN SELECT
items.id, datetime, items.title AS title, content, unread, starred, source, thumbnail, icon, uid, link, updatetime, author, sources.title as sourcetitle, sources.tags as tags
FROM items AS items, sources AS sources
WHERE items.source=sources.id AND TRUE AND unread=1
LIMIT 50 OFFSET 0;
+------+-------------+---------+------+---------------+--------+---------+--------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+---------+------+---------------+--------+---------+--------------------------+------+-------------+
| 1 | SIMPLE | sources | ALL | PRIMARY | NULL | NULL | NULL | 489 | |
| 1 | SIMPLE | items | ref | source | source | 4 | alban_selfoss.sources.id | 349 | Using where |
+------+-------------+---------+------+---------------+--------+---------+--------------------------+------+-------------+
Removing ORDER BY results in in 0.01 second query and no more temp table or filesort.
Solutions
It seems this is due to MySQL's handling of queries using ORDER BY cf. documentation.
A possible fix would be to split the query in 2 parts and let PHP associate items with sources.
SELECT
items.id, datetime, items.title AS title, content, unread, starred, source, thumbnail, icon, uid, link, updatetime, author
FROM items AS items
WHERE unread=1
ORDER BY items.datetime DESC, items.id DESC
LIMIT 50 OFFSET 0;
50 rows in set (0.001 sec)
SELECT sources.title as sourcetitle, sources.tags as tags
FROM sources AS sources;
489 rows in set (0.002 sec)
Optimization of database access is definitely something we need, I just have not gotten around to looking into it so I am thankful for any such analysis.
I wonder if we could come up with an index that would avoid the need to splitting the queries. I would expect something like CREATE INDEX loading_items ON items (source, datetime DESC, id DESC);
to help but it’s been a while since I worked with databases. And the EXPLAIN
report seems to say it is actually sorting sources
, which is confusing to me, as sources
should be already sorted according to its primary key (it is a InnoDB table).
Ha, same, fixing the problem by not changing the code was my first try ! Simple, better, right?
But adding indexes doesn't work. I've tried. I am not a great DBA in any way so someone might get it to work, but so far, nope.
As far as I understand the fact that there are two tables in the query seems to force the sortfile + tmptable
behaviour when using ORDER BY.
That explains why the sources
table index is not being used I believe. Two tables, ORDER BY, no index.
See the official documentation https://dev.mysql.com/doc/refman/5.6/en/order-by-optimization.html regarding ORDER BY optimizations for more details about that behaviour.
So
- maybe PostgreSQL works better? That would be interesting.
- I've been looking at the code to test a fix. To not break anything, I believe the change should be done in
DAOS/mysql/Items.php
Back when I had worked on this, I felt the need to add this in pgsql
. If I remember correctly, this greatly improved fetching and cleanup.
CREATE UNIQUE INDEX uid_idx ON items USING btree (source, uid);
CREATE INDEX datetime_idx ON items USING btree (datetime);
CREATE INDEX updatetime_idx ON items USING btree (updatetime);
Back from testing the code patch.
It appears that altering the DAOS/mysql/Items.php
made the page load in 1.5s versus 7.5, as expected.
Here is a code patch if you want to test.
--- a/tmp/Items.php
+++ b/daos/mysql/Items.php
@@ -320,7 +320,7 @@ class Items extends Database {
// get items from database
- $select = 'SELECT
+/* $select = 'SELECT
items.id, datetime, items.title AS title, content, unread, starred, source, thumbnail, icon, uid, link, updatetime, author, sources.title as sourcetitle, sources.tags as tags
FROM ' . \F3::get('db_prefix') . 'items AS items, ' . \F3::get('db_prefix') . 'sources AS sources
WHERE items.source=sources.id AND';
@@ -346,6 +346,55 @@ class Items extends Database {
}
return \F3::get('db')->exec($query, $params);
+*/ // Original code
+
+ $select = 'SELECT
+ items.id, datetime, items.title AS title, content, unread, starred, source, thumbnail, icon, uid, link, updatetime, author
+ FROM ' . \F3::get('db_prefix') . 'items AS items
+ WHERE TRUE AND ';
+ $order_sql = 'ORDER BY items.datetime ' . $order . ', items.id ' . $order;
+
+ if ($where_ids != '') {
+ // This UNION is required for the extra explicitely requested items
+ // to be included whether or not they would have been excluded by
+ // seek, filter, offset rules.
+ //
+ // SQLite note: the 'entries' SELECT is encapsulated into a
+ // SELECT * FROM (...) to fool the SQLite engine into not
+ // complaining about 'order by clause should come after union not
+ // before'.
+ $query = "SELECT * FROM (
+ SELECT * FROM ($select $where_sql $order_sql LIMIT " . $options['items'] . ' OFFSET ' . $options['offset'] . ") AS entries
+ UNION
+ $select $where_ids
+ ) AS items
+ $order_sql";
+ } else {
+ $query = "$select $where_sql $order_sql LIMIT " . $options['items'] . ' OFFSET ' . $options['offset'];
+ }
+
+
+ $items = \F3::get('db')->exec($query, $params);
+
+
+
+ $select_sources = 'SELECT
+ id, title as sourcetitle, tags as tags
+ FROM sources AS sources;';
+
+ $rawSources = \F3::get('db')->exec($select_sources, $params);
+ $sources = array();
+ // Yuck
+ foreach( $rawSources as $source ){
+ $sources[$source["id"]] = $source;
+ }
+
+ foreach( $items as $i => $item ){
+ $item[$i]['tags'] = $source[$item["source"]]["tags"];
+ $item[$i]['sourcetitle'] = $source[$item["source"]]["sourcetitle"];
+ }
+
+ return $items;
}
Note 1: My selfoss instance's code is apparently seriously out of date... It is Version 2.18 but there was no release since?
So, careful, that patch is based on the 2.18 version.
I should probably upgrade, as git seems to be right way to follow a not-so-dead project after all ;)
Note 2: The queries are breaking as soon as I try to facet (ex: load a tag) but that was expected.