wp-rocket icon indicating copy to clipboard operation
wp-rocket copied to clipboard

Allow excluding part of the website from the preload feature

Open piotrbak opened this issue 2 years ago • 14 comments

Before submitting an issue please check that you’ve completed the following steps:

  • Made sure you’re on the latest version
  • Used the search feature to ensure that the bug hasn’t been reported before

Describe the bug In some specific cases, customers might want to preload only part of the website. That's useful when the site owner is sure that specific URLs are not being visited at all.

Expected behavior We should allow excluding patterns from the preload feature.

Additional context There are a couple of approaches possible there. One of them is described below.

First, we need to make sure that only desired sitemap is loaded and fetched. We can use existing filter to achieve that: https://github.com/wp-media/wp-rocket/blob/b39567786ccc5c321ace147e1ff9dcfaa6fa2eed/inc/Engine/Preload/Controller/LoadInitialSitemap.php#L74

Later, we need to add specific filter to exclude specific regex URL patterns from being preloaded/added to database. If pattern matches current URL, we'd check if URL exists in db:

  • if so, it'll get removed
  • if not, we won't add it

In the UI we'll introduce a text area for exclusions just below the Activate Preload option.

With the above solution, the only possibility that we'll preload not desired URL is cache lifespan expiration. We could also consider using the pattern there, but it would require us to use it directly on filepath or change the filepath to URL.

Backlog Grooming (for WP Media dev team use only)

  • [ ] Reproduce the problem
  • [ ] Identify the root cause
  • [ ] Scope a solution
  • [ ] Estimate the effort

piotrbak avatar Sep 05 '22 14:09 piotrbak

Related - https://secure.helpscout.net/conversation/1999988402/366464/

DahmaniAdame avatar Sep 07 '22 10:09 DahmaniAdame

Related - https://secure.helpscout.net/conversation/2000056403/366486/

DahmaniAdame avatar Sep 07 '22 11:09 DahmaniAdame

For that I was wondering if we can save somewhere (like a transient) the output of the filter and only if it is different than the old one then run a CRON or a AS task to clear to clean the old URLs in the database.

Outside this case we can just prevent from adding new URL matching the regex in the database.

What do you think @piotrbak @engahmeds3ed ?

CrochetFeve0251 avatar Sep 08 '22 07:09 CrochetFeve0251

Related - https://secure.helpscout.net/conversation/2001890228/366891?folderId=3864735

DahmaniAdame avatar Sep 08 '22 11:09 DahmaniAdame

Related: https://secure.helpscout.net/conversation/2000380354/366561

viobru avatar Sep 08 '22 12:09 viobru

No plz, we don't need additional CRON. I checked the preload code and I can see only two places that are creating/updating DB cache rows as below:

  1. When the optimizations are done on the page (after applying rocket_buffer filter) https://github.com/wp-media/wp-rocket/blob/6e499c21b143497d36ef54e4d682d1044e464a4b/inc/Engine/Preload/Subscriber.php#L192-L198

  2. When this url's cache is cleared and this url is not in the cache reject URIs https://github.com/wp-media/wp-rocket/blob/c53d551c38788c3ac1e171cf60b52af4c777c204/inc/Engine/Preload/Controller/ClearCache.php#L34-L39

What I propose here is to make a new method inside this trait https://github.com/wp-media/wp-rocket/blob/c53d551c38788c3ac1e171cf60b52af4c777c204/inc/Engine/Preload/Controller/CheckExcludedTrait.php That will have a filter, we can call it rocket_preload_exclude_urls and check if the current url is inside this filter return and call this method here: https://github.com/wp-media/wp-rocket/blob/6e499c21b143497d36ef54e4d682d1044e464a4b/inc/Engine/Preload/Subscriber.php#L175-L174

and here also https://github.com/wp-media/wp-rocket/blob/c53d551c38788c3ac1e171cf60b52af4c777c204/inc/Engine/Preload/Controller/ClearCache.php#L33

and this will prevent adding new rows into the DB for those excluded patterns and for the urls that are already there in the DB, we can handle this from the automatic purge method that starts here: https://github.com/wp-media/wp-rocket/blob/6e499c21b143497d36ef54e4d682d1044e464a4b/inc/Engine/Preload/Subscriber.php#L374

and calls this https://github.com/wp-media/wp-rocket/blob/c53d551c38788c3ac1e171cf60b52af4c777c204/inc/Engine/Preload/Controller/ClearCache.php#L33

so here we can check if the row exists, delete it from there.

what do u think @CrochetFeve0251 @piotrbak

engahmeds3ed avatar Sep 09 '22 11:09 engahmeds3ed

@CrochetFeve0251 has a concern regarding, When the user clears cache or even the automatic cache is run before the user applied the new filter.

After some discussions, We agreed to simply do it as follows (from the customer perspective):

  1. Use the filter to stop preloading some url (regex)
  2. Clear the cache & preload OR wait till the next automatic cache clear.
  3. As groomed above, we will delete the rows from the DB.

engahmeds3ed avatar Sep 12 '22 09:09 engahmeds3ed

Related https://secure.helpscout.net/conversation/2003129371/367151/

camilamadronero-zz avatar Sep 13 '22 14:09 camilamadronero-zz

@piotrbak, a possible enhancement is to allow prioritizing preload specific items, maybe a sitemap with the most viewed posts. But I'm not sure if the Action Scheduler can facilitate that. Ticket - https://secure.helpscout.net/conversation/2006557268/367868?folderId=3864735

DahmaniAdame avatar Sep 14 '22 07:09 DahmaniAdame

@DahmaniAdame Created separate issue about this enhancement: https://github.com/wp-media/wp-rocket/issues/5428

piotrbak avatar Sep 14 '22 11:09 piotrbak

Related - https://secure.helpscout.net/conversation/2007775038/368118/

DahmaniAdame avatar Sep 15 '22 11:09 DahmaniAdame

Related - https://secure.helpscout.net/conversation/2003558807/367275/

DahmaniAdame avatar Sep 16 '22 05:09 DahmaniAdame

related https://secure.helpscout.net/conversation/2011691051/369058/

camilamadronero-zz avatar Sep 19 '22 21:09 camilamadronero-zz

Related: https://secure.helpscout.net/conversation/2012325277/369162/

vmanthos avatar Sep 22 '22 12:09 vmanthos

@wp-media/qa Within the same PR the following issue is fixed: https://github.com/wp-media/wp-rocket/issues/5445

Please don't forget to include it in the tests. 🙏

piotrbak avatar Sep 26 '22 10:09 piotrbak

Related: https://secure.helpscout.net/conversation/2022394131/371525/

joejoe04 avatar Sep 30 '22 13:09 joejoe04

Related: https://secure.helpscout.net/conversation/2029091199/373032/

joejoe04 avatar Oct 06 '22 13:10 joejoe04