Engineering-Best-Practices icon indicating copy to clipboard operation
Engineering-Best-Practices copied to clipboard

Add paragraph on the performance of meta_queries using EXISTS/NOT_EXISTS

Open moraleida opened this issue 4 years ago • 2 comments
trafficstars

Is your enhancement related to a problem? Please describe.

It is very common that developers introduce new meta_keys without setting default values for posts that already existed in the database, or that they delete meta_keys when the value is not needed instead of setting it to a default or false value.

This becomes an issue on larger database tables when users need to find some information by filtering not only by the meta_key and value but also sometimes by checking if that meta_key does not exist at all. This use case is covered in WP_Meta_Query, allowing for the use of compare => 'NOT EXISTS'

This solution should, however, be discouraged, as it generates very expensive queries that can really hammer a database performance.

Example:

  • Database with 253k rows in wp_posts and 5 million rows in wp_postmeta
  • There are 51,406 rows with the _publicize_pending meta_key in wp_postmeta
/**
 * Using NOT_EXISTS in a Meta Query generates a LEFT JOIN clause
 * with an IS NULL check in WHERE
 *
 * Rows searched: 268,358 (all rows in the join)
 */
$query = new \WP_Query(
	array(
		'suppress_filters' => true,
		'meta_query'       => array(
			array(
				'key'     => '_publicize_pending',
				'compare' => 'NOT EXISTS',
			),
		),
	) );

Generates:

MySQL [wordpress]> EXPLAIN SELECT SQL_CALC_FOUND_ROWS  wp_posts.ID FROM wp_posts  
    -> LEFT JOIN wp_postmeta ON (wp_posts.ID = wp_postmeta.post_id AND wp_postmeta.meta_key = 'some_key' ) 
    -> WHERE 1=1  
    -> AND ( wp_postmeta.post_id IS NULL ) 
    -> GROUP BY wp_posts.ID ORDER BY wp_posts.post_date DESC LIMIT 0, 10;
+----+-------------+-------------+------------+-------+-----------------------------------------------------------------+---------+---------+-----------------------+--------+----------+---------------------------------+
| id | select_type | table       | partitions | type  | possible_keys                                                   | key     | key_len | ref                   | rows   | filtered | Extra                           |
+----+-------------+-------------+------------+-------+-----------------------------------------------------------------+---------+---------+-----------------------+--------+----------+---------------------------------+
|  1 | SIMPLE      | wp_posts    | NULL       | index | PRIMARY,post_name,type_status_date,post_parent,post_author,guid | PRIMARY | 8       | NULL                  | 268339 |   100.00 | Using temporary; Using filesort |
|  1 | SIMPLE      | wp_postmeta | NULL       | ref   | post_id,meta_key,meta_key_meta_value                            | post_id | 8       | wordpress.wp_posts.ID |     19 |   100.00 | Using where; Not exists         |
+----+-------------+-------------+------------+-------+-----------------------------------------------------------------+---------+---------+-----------------------+--------+----------+---------------------------------+

In contrast, this query checking for a specific value in the declared meta_key only needs to read 89,970 rows, or roughly 70% fewer rows.

/**
 * Querying only for the meta_key and value generates an INNER JOIN clause
 * with a key/value match in WHERE
 *
 * Rows searched: 89,970
 */
$query = new \WP_Query(
	array(
		'suppress_filters' => true,
		'meta_query'       => array(
			array(
				'key'     => '_publicize_pending',
				'value'     => '1',
				'compare' => '!=',
			),
		),
	) );

Generates:

MySQL [wordpress]> EXPLAIN SELECT wp_posts.ID
    -> FROM wp_posts
    -> INNER JOIN wp_postmeta
    -> ON ( wp_posts.ID = wp_postmeta.post_id )
    -> WHERE 1=1
    -> AND ( ( wp_postmeta.meta_key = '_publicize_pending'
    -> AND wp_postmeta.meta_value != '1' ) );
+----+-------------+-------------+------------+--------+--------------------------------------+---------------------+---------+-------------------------------+-------+----------+-------------+
| id | select_type | table       | partitions | type   | possible_keys                        | key                 | key_len | ref                           | rows  | filtered | Extra       |
+----+-------------+-------------+------------+--------+--------------------------------------+---------------------+---------+-------------------------------+-------+----------+-------------+
|  1 | SIMPLE      | wp_postmeta | NULL       | range  | post_id,meta_key,meta_key_meta_value | meta_key_meta_value | 1534    | NULL                          | 89970 |   100.00 | Using where |
|  1 | SIMPLE      | wp_posts    | NULL       | eq_ref | PRIMARY                              | PRIMARY             | 8       | wordpress.wp_postmeta.post_id |     1 |   100.00 | Using index |
+----+-------------+-------------+------------+--------+--------------------------------------+---------------------+---------+-------------------------------+-------+----------+-------------+
2 rows in set, 1 warning (0.001 sec)

moraleida avatar Feb 22 '21 21:02 moraleida

@moraleida Hi 👋, you guys are doing awesome work with Engineering Practices! 🙌 Thank you for making these awesome docs.

I was recently checking this issue and would like to share my thoughts, not sure if this is a good idea for bigger picture but let me share:

  • NOT EXISTS check: if the user makes a meta_key with appending _1 after the actual meta_key then it will be straight forward query to just check meta_key = _publicize_pending_1, so the end query will be as below:
$query = new \WP_Query(
	array(
		'suppress_filters' => true,
		'meta_key'         => '_publicize_pending_1',
	)
);

The above query will fetch all the posts which are pending for publicizing, and I guess will be somewhat faster than checking with the value. As we all know, in the normal WP environment index is available on meta_key only, not on the meta_value, however the WordPress VIP platform indexing meta_value in the combination of meta_key but in some limit of characters. So I guess we can think of this way as well for this issue, still correct me if I have interrupted in wrong discussions.

Thanks! 🙂

vishalkakadiya avatar Aug 14 '21 13:08 vishalkakadiya

@vishalkakadiya I think your solution should work fine. It taps into the same idea that is, always look for something that exists, instead of for something that doesn't. I suggest adding different values for a meta key, but having different meta-keys will most likely work just fine too.

I don't think indexing the meta_value column makes a lot of difference in this case, because the first thing the query will do is join the tables to contain only that meta_key. But then again, if we step into the millions of records within the join, then that might be a good replacement.

moraleida avatar Aug 17 '21 23:08 moraleida