wp-rocket
wp-rocket copied to clipboard
Allow excluding part of the website from the preload feature
Before submitting an issue please check that you’ve completed the following steps:
- Made sure you’re on the latest version
- Used the search feature to ensure that the bug hasn’t been reported before
Describe the bug In some specific cases, customers might want to preload only part of the website. That's useful when the site owner is sure that specific URLs are not being visited at all.
Expected behavior We should allow excluding patterns from the preload feature.
Additional context There are a couple of approaches possible there. One of them is described below.
First, we need to make sure that only desired sitemap is loaded and fetched. We can use existing filter to achieve that: https://github.com/wp-media/wp-rocket/blob/b39567786ccc5c321ace147e1ff9dcfaa6fa2eed/inc/Engine/Preload/Controller/LoadInitialSitemap.php#L74
Later, we need to add specific filter to exclude specific regex URL patterns from being preloaded/added to database. If pattern matches current URL, we'd check if URL exists in db:
- if so, it'll get removed
- if not, we won't add it
In the UI we'll introduce a text area for exclusions just below the Activate Preload option.
With the above solution, the only possibility that we'll preload not desired URL is cache lifespan expiration. We could also consider using the pattern there, but it would require us to use it directly on filepath or change the filepath to URL.
Backlog Grooming (for WP Media dev team use only)
- [ ] Reproduce the problem
- [ ] Identify the root cause
- [ ] Scope a solution
- [ ] Estimate the effort
Related - https://secure.helpscout.net/conversation/1999988402/366464/
Related - https://secure.helpscout.net/conversation/2000056403/366486/
For that I was wondering if we can save somewhere (like a transient) the output of the filter and only if it is different than the old one then run a CRON or a AS task to clear to clean the old URLs in the database.
Outside this case we can just prevent from adding new URL matching the regex in the database.
What do you think @piotrbak @engahmeds3ed ?
Related - https://secure.helpscout.net/conversation/2001890228/366891?folderId=3864735
Related: https://secure.helpscout.net/conversation/2000380354/366561
No plz, we don't need additional CRON. I checked the preload code and I can see only two places that are creating/updating DB cache rows as below:
-
When the optimizations are done on the page (after applying rocket_buffer filter) https://github.com/wp-media/wp-rocket/blob/6e499c21b143497d36ef54e4d682d1044e464a4b/inc/Engine/Preload/Subscriber.php#L192-L198
-
When this url's cache is cleared and this url is not in the cache reject URIs https://github.com/wp-media/wp-rocket/blob/c53d551c38788c3ac1e171cf60b52af4c777c204/inc/Engine/Preload/Controller/ClearCache.php#L34-L39
What I propose here is to make a new method inside this trait
https://github.com/wp-media/wp-rocket/blob/c53d551c38788c3ac1e171cf60b52af4c777c204/inc/Engine/Preload/Controller/CheckExcludedTrait.php
That will have a filter, we can call it rocket_preload_exclude_urls
and check if the current url is inside this filter return and call this method here:
https://github.com/wp-media/wp-rocket/blob/6e499c21b143497d36ef54e4d682d1044e464a4b/inc/Engine/Preload/Subscriber.php#L175-L174
and here also https://github.com/wp-media/wp-rocket/blob/c53d551c38788c3ac1e171cf60b52af4c777c204/inc/Engine/Preload/Controller/ClearCache.php#L33
and this will prevent adding new rows into the DB for those excluded patterns and for the urls that are already there in the DB, we can handle this from the automatic purge method that starts here: https://github.com/wp-media/wp-rocket/blob/6e499c21b143497d36ef54e4d682d1044e464a4b/inc/Engine/Preload/Subscriber.php#L374
and calls this https://github.com/wp-media/wp-rocket/blob/c53d551c38788c3ac1e171cf60b52af4c777c204/inc/Engine/Preload/Controller/ClearCache.php#L33
so here we can check if the row exists, delete it from there.
what do u think @CrochetFeve0251 @piotrbak
@CrochetFeve0251 has a concern regarding, When the user clears cache or even the automatic cache is run before the user applied the new filter.
After some discussions, We agreed to simply do it as follows (from the customer perspective):
- Use the filter to stop preloading some url (regex)
- Clear the cache & preload OR wait till the next automatic cache clear.
- As groomed above, we will delete the rows from the DB.
Related https://secure.helpscout.net/conversation/2003129371/367151/
@piotrbak, a possible enhancement is to allow prioritizing preload specific items, maybe a sitemap with the most viewed posts. But I'm not sure if the Action Scheduler can facilitate that. Ticket - https://secure.helpscout.net/conversation/2006557268/367868?folderId=3864735
@DahmaniAdame Created separate issue about this enhancement: https://github.com/wp-media/wp-rocket/issues/5428
Related - https://secure.helpscout.net/conversation/2007775038/368118/
Related - https://secure.helpscout.net/conversation/2003558807/367275/
related https://secure.helpscout.net/conversation/2011691051/369058/
Related: https://secure.helpscout.net/conversation/2012325277/369162/
@wp-media/qa Within the same PR the following issue is fixed: https://github.com/wp-media/wp-rocket/issues/5445
Please don't forget to include it in the tests. 🙏
Related: https://secure.helpscout.net/conversation/2022394131/371525/
Related: https://secure.helpscout.net/conversation/2029091199/373032/