Closes 5394: exclude urls from preload
Description
Added a way to exclude uris from the preload.
For that we added a new check with a filter that check with the url to determine if it is excluded from the preload. Then we used that filter where we update and add the urls to exclude them or delete them.
Fixes #5394
Type of change
- [ ] Enhancement (non-breaking change which improves an existing functionality)
Is the solution different from the one proposed during the grooming?
No.
How Has This Been Tested?
- [ ] Automated Test
- [ ] Manual Test
Checklist:
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my own code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged and published in downstream modules
@CrochetFeve0251 Would it be possible to include the following issue directly in this PR? https://github.com/wp-media/wp-rocket/issues/5445
It sounds like they're touching exactly the same part of the code and I believe we could solve it by just adding the Never Cache URLs values to the new filter.
Moved to in-progress till we handle slack discussion
@CrochetFeve0251 Is this one ready to be reviewed again?
@CrochetFeve0251 Is this one ready to be reviewed again?
@piotrbak it is
@CrochetFeve0251 Currently, when any URL is added in the Preload exclusion text area the whole cache is cleared which is not something that should be happening.
This also affects QA testing, for example when trying to check if partial cache clearing will result in the removal of an excluded URL from the database.
Can you please look into this? 🙏
PS: The PR is in "Ready for review". So whoever reviews it could also check this change as well.
@vmanthos I updated the code a bit following your comment, can you test again?
@Tabrisrp When permalinks have a trailing / the change works.
When their structure doesn't have a trailing / excluded URLs with a trailing / are preloaded.
I'm getting mixed results for URLs added in the Never Cache URL(s) text area, i.e. some of them are still preloaded although they shouldn't.
This has also to do with how we currently handle URLs added there. For them to be excluded, a trailing / has to be added or not depending on whether one exists or not in the permalinks.
I've created a spreadsheet to depict my findings: https://docs.google.com/spreadsheets/d/1oMq-k0PqYeIriS1DdfJ5p99ayGNhs1mJZ4VjYYP1DlQ/edit?usp=sharing
From a user's perspective, exclusions should work regardless of whether they add a trailing / or not in both the preload exclusion area and the Never Cache URL(s).
Also, URLs like the following are not currently excluded:
-
https://vasilis.rocketlabsqa.ovh/pariatur-ipsam-reiciendis-aliquam-perferendis-eos-corporis?source=fb_campaign -
https://vasilis.rocketlabsqa.ovh/nisi-quos-repudiandae-magni#contact
@piotrbak Should we handle these?
About URLs containing query strings (example 1), once we allow preloading them using custom sitemaps, we should consider if we'll offer the ability to exclude them using the UI. Of course, customers can edit the sitemaps to remove the ones they don't want to be preloaded.
About 2, it may make sense to not preload the URL without the anchor link.
@vmanthos
About URLs containing query strings (example 1), once we allow preloading them using custom sitemaps, we should consider if we'll offer the ability to exclude them using the UI. Of course, customers can edit the sitemaps to remove the ones they don't want to be preloaded.
Yes, I believe we should allow the exclusions there, but I think only when they're specified directly like:
/pariatur-ipsam-reiciendis-aliquam-perferendis-eos-corporis?(.*)
What do you think?
About 2, it may make sense to not preload the URL without the anchor link.
Not sure if I understood correctly. Do you think it makes sense to save URLs with the anchor to the DB at all? 🤔
I believe we should allow the exclusions there, but I think only when they're specified directly
This seems good.
Do you think it makes sense to save URLs with the anchor to the DB at all?
If a URL having an anchor is added to the exclusion area, we should exclude it from preload, i.e. treat it as though it didn't have the anchor link.
Can we summarize what needs to happen here to get the PR ready?
@Tabrisrp The only question here is about how this PR will work together with https://github.com/wp-media/wp-rocket/pull/5491.
This type of exclusions should match, right? /pariatur-ipsam-reiciendis-aliquam-perferendis-eos-corporis?(.*)
I'll send you the description + documentation URLs once confirmed with Lucy and we're 👌 here.
@Tabrisrp Here's the detailed wording for this textarea: Title: Exclude URLs (it should be URLs not URI, what do you think?) Description: Specify URLs to be excluded from the preload feature (one per line). More info More info links:
https://docs.wp-rocket.me/article/1721-exclude-urls-from-being-preloaded
6349682bde258f5018eb456d
https://fr.docs.wp-rocket.me/article/1722-exclure-url-du-prechargement
63496d5b8a552811521e52d2
Placeholder: /author/(.)
Below the textarea: Use (.) wildcards to address multiple URLs under a given path.

setting field is updated with all the info. For the query string, we can test that in the other PR once we have this merged
@Tabrisrp Still, when permalinks don't have a trailing / and the exclusion has, the URL is preloaded.
Logging $regex and $url here:
https://github.com/wp-media/wp-rocket/blob/324bf97608d7dcd5aa92ea3374838256abca508e/inc/Engine/Preload/Controller/CheckExcludedTrait.php#L45
I'm getting:
Regex: /consectetur-bibendum-eu/
URL: https://vasilis.rocketlabsqa.ovh/consectetur-bibendum-eu
which is why the RegEx doesn't exclude the URL.
untrailingslashit() the $regex here:
https://github.com/wp-media/wp-rocket/blob/324bf97608d7dcd5aa92ea3374838256abca508e/inc/Engine/Preload/Controller/CheckExcludedTrait.php#L40
works but then we'll have issues if, e.g. a (.*) is added after the path.
Can you please take a look into this? 🙏
After the latest changes this is working as expected except for this scenario:
- Permalinks settings without a trailing
/ - Never Cache URLs with a trailing
/-> preloaded & cached as per #5509.
@piotrbak Is this something that should be handled now?
@vmanthos I think we're okay here for now, it's a bit edgy and we already have an issue for that