wordpress-seo icon indicating copy to clipboard operation
wordpress-seo copied to clipboard

get_limited_unindexed_count impacts performance

Open dgwatkins opened this issue 11 months ago • 4 comments

  • [x] I've read and understood the contribution guidelines.
  • [x] I've searched for any related issues and avoided creating a duplicate issue.

Please give us a description of what happened

To Reproduce

Step-by-step reproduction instructions

  1. Yoast plugin is checking every 15 minutes (based on transient set) if there are some posts that are not indexed
  2. Query to fetch those takes around 20 seconds to return the data
  3. Query runs on any page in admin since it's hooked to admin_init
  4. Query returns about 5273840 results on our production site (expected due to large dataset)
  5. If we try to write the same query, but using JOIN instead of subselect, we can get drastic improvement in time
  6. This seems to be a duplicate of #21854 but I am reporting again because it was already closed

Expected results

  1. To not have an expensive query running every 15m

Actual results

  1. Significant slow down each time the transient expires

Screenshots, screen recording, code snippet

If possible, please provide a screenshot, a screen recording or a code snippet which demonstrates the bug.

Image

Image

Technical info

  • If relevant, which editor is affected (or editors):
  • [ ] Block Editor
  • [ ] Gutenberg Editor
  • [ ] Elementor Editor
  • [ ] Classic Editor
  • [ ] Other:
  • Which browser is affected (or browsers):
  • [ ] Chrome
  • [ ] Firefox
  • [ ] Safari
  • [ ] Other:

Used versions

  • Device you are using: Chrome on laptop
  • Operating system: MacOs
  • PHP version: 7.4
  • WordPress version: 6.7.1
  • WordPress Theme: Custom
  • Yoast SEO version: latest
  • Gutenberg plugin version:
  • Elementor plugin version:
  • Classic Editor plugin version:
  • Relevant plugins in case of a bug:

dgwatkins avatar Jan 20 '25 19:01 dgwatkins

Hi @dgwatkins

Thank you for taking the time to report this issue in detail. It was previously reported here: (https://github.com/Yoast/wordpress-seo/issues/18065)(https://github.com/Yoast/wordpress-seo/issues/18065), but it was closed due to a lack of interaction.

I have internally shared the issue you’ve raised to request the re-opening of the above-mentioned issue.

josevarghese avatar Jan 21 '25 09:01 josevarghese

Internal Slack conversation: https://yoast.slack.com/archives/C01NCRHHN30/p1737444482043139

josevarghese avatar Jan 21 '25 09:01 josevarghese

@dgwatkins what performance profiler are you using to visualize your execution stack there?

mkormendy avatar Feb 12 '25 08:02 mkormendy

That would be Blackfire @mkormendy

dgwatkins avatar Feb 12 '25 10:02 dgwatkins

Same issue here.

The slow query is build by Indexable_Post_Indexation_Action::get_select_query()

SELECT P.ID
FROM wp_posts AS P
WHERE P.post_type IN ('post', 'page', 'attachment', 'taxonomy-page', 'location', 'article', 'pickup', 'writer', 'series')
AND P.post_status NOT IN ('auto-draft')
AND P.ID not in (
    SELECT I.object_id from wp_yoast_indexable as I
    WHERE I.object_type = 'post'
    AND I.version = 2 )
LIMIT 2

Explain:

id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY P range type_status_date type_status_date 164 NULL 378243 Using where; Using index
2 MATERIALIZED I ALL object_type_and_sub_type,object_id_and_type,permalink_hash_and_object_type,subpages,prominent_words,published_sitemap_index NULL NULL NULL 813223 Using where

Subquery(p.ID not in ()) is the bottleneck. It scans 813,223 rows without using index.

This query run every 15 minutes on WP Admin screen and hang up for 17.0s.

As commented by @archon810 , this query can be optimized like below:

SELECT P.ID
FROM wp_posts AS P
LEFT JOIN wp_yoast_indexable as I
       ON P.ID = I.object_id AND I.object_type = 'post' AND I.version = 2
WHERE P.post_type IN ('post', 'page', 'attachment', 'taxonomy-page', 'location', 'article', 'pickup', 'writer', 'series')
 AND P.post_status NOT IN ('auto-draft')
 AND I.version IS NULL
LIMIT 2

This query is still slow, but took 3.50s. Relatively faster. But this query improvement solutions were closed.

I'm trying to solve this issue on my side.

  1. Register my_yoast_cache_warming wp-cron every 5 minutes.
  2. Clear transient and Indexable_Post_Indexation_Action->get_limited_unindexed_count at the hook.

I will report the result if the solution above succeed.

F.Y.I

Info Spec
MySQL AWS RDS db.m6g.large
Post count 40,000
WP 6.8.1
PHP 8.3.20

fumikito avatar Jun 27 '25 04:06 fumikito