Engineering-Best-Practices
Engineering-Best-Practices copied to clipboard
Add paragraph on the performance of meta_queries using EXISTS/NOT_EXISTS
Is your enhancement related to a problem? Please describe.
It is very common that developers introduce new meta_keys without setting default values for posts that already existed in the database, or that they delete meta_keys when the value is not needed instead of setting it to a default or false value.
This becomes an issue on larger database tables when users need to find some information by filtering not only by the meta_key and value but also sometimes by checking if that meta_key does not exist at all. This use case is covered in WP_Meta_Query, allowing for the use of compare => 'NOT EXISTS'
This solution should, however, be discouraged, as it generates very expensive queries that can really hammer a database performance.
Example:
- Database with 253k rows in wp_posts and 5 million rows in wp_postmeta
- There are 51,406 rows with the
_publicize_pendingmeta_key in wp_postmeta
/**
* Using NOT_EXISTS in a Meta Query generates a LEFT JOIN clause
* with an IS NULL check in WHERE
*
* Rows searched: 268,358 (all rows in the join)
*/
$query = new \WP_Query(
array(
'suppress_filters' => true,
'meta_query' => array(
array(
'key' => '_publicize_pending',
'compare' => 'NOT EXISTS',
),
),
) );
Generates:
MySQL [wordpress]> EXPLAIN SELECT SQL_CALC_FOUND_ROWS wp_posts.ID FROM wp_posts
-> LEFT JOIN wp_postmeta ON (wp_posts.ID = wp_postmeta.post_id AND wp_postmeta.meta_key = 'some_key' )
-> WHERE 1=1
-> AND ( wp_postmeta.post_id IS NULL )
-> GROUP BY wp_posts.ID ORDER BY wp_posts.post_date DESC LIMIT 0, 10;
+----+-------------+-------------+------------+-------+-----------------------------------------------------------------+---------+---------+-----------------------+--------+----------+---------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+------------+-------+-----------------------------------------------------------------+---------+---------+-----------------------+--------+----------+---------------------------------+
| 1 | SIMPLE | wp_posts | NULL | index | PRIMARY,post_name,type_status_date,post_parent,post_author,guid | PRIMARY | 8 | NULL | 268339 | 100.00 | Using temporary; Using filesort |
| 1 | SIMPLE | wp_postmeta | NULL | ref | post_id,meta_key,meta_key_meta_value | post_id | 8 | wordpress.wp_posts.ID | 19 | 100.00 | Using where; Not exists |
+----+-------------+-------------+------------+-------+-----------------------------------------------------------------+---------+---------+-----------------------+--------+----------+---------------------------------+
In contrast, this query checking for a specific value in the declared meta_key only needs to read 89,970 rows, or roughly 70% fewer rows.
/**
* Querying only for the meta_key and value generates an INNER JOIN clause
* with a key/value match in WHERE
*
* Rows searched: 89,970
*/
$query = new \WP_Query(
array(
'suppress_filters' => true,
'meta_query' => array(
array(
'key' => '_publicize_pending',
'value' => '1',
'compare' => '!=',
),
),
) );
Generates:
MySQL [wordpress]> EXPLAIN SELECT wp_posts.ID
-> FROM wp_posts
-> INNER JOIN wp_postmeta
-> ON ( wp_posts.ID = wp_postmeta.post_id )
-> WHERE 1=1
-> AND ( ( wp_postmeta.meta_key = '_publicize_pending'
-> AND wp_postmeta.meta_value != '1' ) );
+----+-------------+-------------+------------+--------+--------------------------------------+---------------------+---------+-------------------------------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+------------+--------+--------------------------------------+---------------------+---------+-------------------------------+-------+----------+-------------+
| 1 | SIMPLE | wp_postmeta | NULL | range | post_id,meta_key,meta_key_meta_value | meta_key_meta_value | 1534 | NULL | 89970 | 100.00 | Using where |
| 1 | SIMPLE | wp_posts | NULL | eq_ref | PRIMARY | PRIMARY | 8 | wordpress.wp_postmeta.post_id | 1 | 100.00 | Using index |
+----+-------------+-------------+------------+--------+--------------------------------------+---------------------+---------+-------------------------------+-------+----------+-------------+
2 rows in set, 1 warning (0.001 sec)
@moraleida Hi 👋, you guys are doing awesome work with Engineering Practices! 🙌 Thank you for making these awesome docs.
I was recently checking this issue and would like to share my thoughts, not sure if this is a good idea for bigger picture but let me share:
NOT EXISTScheck: if the user makes a meta_key with appending_1after the actual meta_key then it will be straight forward query to just checkmeta_key=_publicize_pending_1, so the end query will be as below:
$query = new \WP_Query(
array(
'suppress_filters' => true,
'meta_key' => '_publicize_pending_1',
)
);
The above query will fetch all the posts which are pending for publicizing, and I guess will be somewhat faster than checking with the value. As we all know, in the normal WP environment index is available on meta_key only, not on the meta_value, however the WordPress VIP platform indexing meta_value in the combination of meta_key but in some limit of characters. So I guess we can think of this way as well for this issue, still correct me if I have interrupted in wrong discussions.
Thanks! 🙂
@vishalkakadiya I think your solution should work fine. It taps into the same idea that is, always look for something that exists, instead of for something that doesn't. I suggest adding different values for a meta key, but having different meta-keys will most likely work just fine too.
I don't think indexing the meta_value column makes a lot of difference in this case, because the first thing the query will do is join the tables to contain only that meta_key. But then again, if we step into the millions of records within the join, then that might be a good replacement.