wp-rocket icon indicating copy to clipboard operation
wp-rocket copied to clipboard

Closes 5394: exclude urls from preload

Open CrochetFeve0251 opened this issue 3 years ago • 1 comments

Description

Added a way to exclude uris from the preload.

For that we added a new check with a filter that check with the url to determine if it is excluded from the preload. Then we used that filter where we update and add the urls to exclude them or delete them.

Fixes #5394

Type of change

  • [ ] Enhancement (non-breaking change which improves an existing functionality)

Is the solution different from the one proposed during the grooming?

No.

How Has This Been Tested?

  • [ ] Automated Test
  • [ ] Manual Test

Checklist:

  • [ ] My code follows the style guidelines of this project
  • [ ] I have performed a self-review of my own code
  • [ ] I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] My changes generate no new warnings
  • [ ] I have added tests that prove my fix is effective or that my feature works
  • [ ] New and existing unit tests pass locally with my changes
  • [ ] Any dependent changes have been merged and published in downstream modules

CrochetFeve0251 avatar Sep 15 '22 11:09 CrochetFeve0251

@CrochetFeve0251 Would it be possible to include the following issue directly in this PR? https://github.com/wp-media/wp-rocket/issues/5445

It sounds like they're touching exactly the same part of the code and I believe we could solve it by just adding the Never Cache URLs values to the new filter.

piotrbak avatar Sep 19 '22 11:09 piotrbak

Moved to in-progress till we handle slack discussion

Mai-Saad avatar Sep 27 '22 13:09 Mai-Saad

@CrochetFeve0251 Is this one ready to be reviewed again?

piotrbak avatar Oct 05 '22 12:10 piotrbak

@CrochetFeve0251 Is this one ready to be reviewed again?

@piotrbak it is

CrochetFeve0251 avatar Oct 05 '22 12:10 CrochetFeve0251

@CrochetFeve0251 Currently, when any URL is added in the Preload exclusion text area the whole cache is cleared which is not something that should be happening.

This also affects QA testing, for example when trying to check if partial cache clearing will result in the removal of an excluded URL from the database.

Can you please look into this? 🙏

PS: The PR is in "Ready for review". So whoever reviews it could also check this change as well.

vmanthos avatar Oct 07 '22 07:10 vmanthos

@vmanthos I updated the code a bit following your comment, can you test again?

remyperona avatar Oct 13 '22 20:10 remyperona

@Tabrisrp When permalinks have a trailing / the change works.

When their structure doesn't have a trailing / excluded URLs with a trailing / are preloaded.

I'm getting mixed results for URLs added in the Never Cache URL(s) text area, i.e. some of them are still preloaded although they shouldn't.

This has also to do with how we currently handle URLs added there. For them to be excluded, a trailing / has to be added or not depending on whether one exists or not in the permalinks.

I've created a spreadsheet to depict my findings: https://docs.google.com/spreadsheets/d/1oMq-k0PqYeIriS1DdfJ5p99ayGNhs1mJZ4VjYYP1DlQ/edit?usp=sharing

From a user's perspective, exclusions should work regardless of whether they add a trailing / or not in both the preload exclusion area and the Never Cache URL(s).


Also, URLs like the following are not currently excluded:

  1. https://vasilis.rocketlabsqa.ovh/pariatur-ipsam-reiciendis-aliquam-perferendis-eos-corporis?source=fb_campaign
  2. https://vasilis.rocketlabsqa.ovh/nisi-quos-repudiandae-magni#contact

@piotrbak Should we handle these?

About URLs containing query strings (example 1), once we allow preloading them using custom sitemaps, we should consider if we'll offer the ability to exclude them using the UI. Of course, customers can edit the sitemaps to remove the ones they don't want to be preloaded.

About 2, it may make sense to not preload the URL without the anchor link.

vmanthos avatar Oct 14 '22 06:10 vmanthos

@vmanthos

About URLs containing query strings (example 1), once we allow preloading them using custom sitemaps, we should consider if we'll offer the ability to exclude them using the UI. Of course, customers can edit the sitemaps to remove the ones they don't want to be preloaded.

Yes, I believe we should allow the exclusions there, but I think only when they're specified directly like: /pariatur-ipsam-reiciendis-aliquam-perferendis-eos-corporis?(.*) What do you think?

About 2, it may make sense to not preload the URL without the anchor link.

Not sure if I understood correctly. Do you think it makes sense to save URLs with the anchor to the DB at all? 🤔

piotrbak avatar Oct 14 '22 13:10 piotrbak

I believe we should allow the exclusions there, but I think only when they're specified directly

This seems good.

Do you think it makes sense to save URLs with the anchor to the DB at all?

If a URL having an anchor is added to the exclusion area, we should exclude it from preload, i.e. treat it as though it didn't have the anchor link.

vmanthos avatar Oct 14 '22 13:10 vmanthos

Can we summarize what needs to happen here to get the PR ready?

remyperona avatar Oct 14 '22 14:10 remyperona

@Tabrisrp The only question here is about how this PR will work together with https://github.com/wp-media/wp-rocket/pull/5491. This type of exclusions should match, right? /pariatur-ipsam-reiciendis-aliquam-perferendis-eos-corporis?(.*)

I'll send you the description + documentation URLs once confirmed with Lucy and we're 👌 here.

piotrbak avatar Oct 14 '22 14:10 piotrbak

@Tabrisrp Here's the detailed wording for this textarea: Title: Exclude URLs (it should be URLs not URI, what do you think?) Description: Specify URLs to be excluded from the preload feature (one per line). More info More info links:

https://docs.wp-rocket.me/article/1721-exclude-urls-from-being-preloaded
6349682bde258f5018eb456d
https://fr.docs.wp-rocket.me/article/1722-exclure-url-du-prechargement
63496d5b8a552811521e52d2

Placeholder: /author/(.) Below the textarea: Use (.) wildcards to address multiple URLs under a given path. image

piotrbak avatar Oct 14 '22 15:10 piotrbak

setting field is updated with all the info. For the query string, we can test that in the other PR once we have this merged

remyperona avatar Oct 14 '22 17:10 remyperona

@Tabrisrp Still, when permalinks don't have a trailing / and the exclusion has, the URL is preloaded.

Logging $regex and $url here: https://github.com/wp-media/wp-rocket/blob/324bf97608d7dcd5aa92ea3374838256abca508e/inc/Engine/Preload/Controller/CheckExcludedTrait.php#L45

I'm getting:

Regex: /consectetur-bibendum-eu/
URL: https://vasilis.rocketlabsqa.ovh/consectetur-bibendum-eu

which is why the RegEx doesn't exclude the URL.

untrailingslashit() the $regex here: https://github.com/wp-media/wp-rocket/blob/324bf97608d7dcd5aa92ea3374838256abca508e/inc/Engine/Preload/Controller/CheckExcludedTrait.php#L40

works but then we'll have issues if, e.g. a (.*) is added after the path.

Can you please take a look into this? 🙏

vmanthos avatar Oct 17 '22 06:10 vmanthos

After the latest changes this is working as expected except for this scenario:

  • Permalinks settings without a trailing /
  • Never Cache URLs with a trailing / -> preloaded & cached as per #5509.

@piotrbak Is this something that should be handled now?

vmanthos avatar Oct 18 '22 06:10 vmanthos

@vmanthos I think we're okay here for now, it's a bit edgy and we already have an issue for that

piotrbak avatar Oct 18 '22 08:10 piotrbak