wp-rocket icon indicating copy to clipboard operation
wp-rocket copied to clipboard

Preload - Query strings in Sitemaps can't be preloaded anymore

Open WordPresseur opened this issue 3 years ago • 11 comments

Before submitting an issue please check that you’ve completed the following steps:

  • Made sure you’re on the latest version ✅
  • Used the search feature to ensure that the bug hasn’t been reported before ✅

Describe the bug In the previous version of the preload feature, it was possible to preload query strings found in a sitemap if the parameters were set as cacheable. It's not possible anymore since the query strings are removed before the URL is stored in the DB: https://github.com/wp-media/wp-rocket/blob/253255a7b2f68472b099eb0a2fa4e7867b555a1a/inc/Engine/Preload/Database/Queries/Cache.php#L185

To Reproduce Steps to reproduce the behavior:

  1. Create a custom sitemap with query strings
  2. Add it to the preloading
  3. Launch the preload
  4. URLs won't be preloaded

Expected behavior Preloading Query Strings should still be possible since this was possible before.

Additional context Slack convo: https://wp-media.slack.com/archives/C08N8J6VC/p1663058379862659

Backlog Grooming (for WP Media dev team use only)

  • [ ] Reproduce the problem
  • [ ] Identify the root cause
  • [ ] Scope a solution
  • [ ] Estimate the effort

WordPresseur avatar Sep 13 '22 09:09 WordPresseur

Request https://secure.helpscout.net/conversation/2003218245/367180/

camilamadronero-zz avatar Sep 14 '22 20:09 camilamadronero-zz

line 185 seems not to be the only one cutting out the querystring in cache.php for preload, as well the following lines in cache.php strip the querystring:

line 125: $url = untrailingslashit( strtok( $resource['url'], '?' ) );

line 226: $url = strtok( $url, '?' );

please fix, because I depend on preload querystrings

marcloeb avatar Sep 14 '22 21:09 marcloeb

Reproduce the problem

The issue is easily reproducible.

Identify the root cause

The root issue is due to the fact the preload has been guarded against query strings.

Scope a solution

As we don't want to remove the guard for everyone we can implement the solution and activate it using a filter named rocket_preload_query_string.

Then we will have to remove guard inside the Cache class at theses levels: get_rows_by_url, create_or_nothing, create_or_update by wrapping them inside a condition based on the result from the filter rocket_preload_query_string.

Then we will have to make the preload to pass to complete for that inside the Preload Subscriber we will have to change the logic from the update_cache_row method from : https://github.com/wp-media/wp-rocket/blob/103abf7be9d8acd6ea3abae742b533bfd9cd4c32/inc/Engine/Preload/Subscriber.php#L188 to:

if ( (! apply_filters('rocket_preload_query_string', false) &&  isset( $_GET ) && is_array( $_GET ) ) && 0 < count( $_GET ) || ( $this->query->is_pending( $url ) && $this->options->get( 'do_caching_mobile_files', false ) ) ) { // phpcs:ignore WordPress.Security.NonceVerification.Recommended
			return;
		}

Finally we will have to check if the URL contains a cache query string in the PreloadUrl class:

public function has_cached_query_string(string $url) {
 		$queries = wp_parse_url( $url, PHP_URL_QUERY);
               return count(array_intersect($queries, get_rocket_cache_query_string())) > 0;
}

	public function preload_url( string $url ) {
           if (! $this->has_cached_query_string( $url ) ) {
            $this->query->make_status_complete( $url );
			return;
          }
          $url = $this->format_url($url);

To format the url we will use that function:

public function format_url(string $url) {
 		$queries = ksort(wp_parse_url( $url, PHP_URL_QUERY));
                $url = strtok( $url, '?' );
               return add_query_arg($queries, $url);
}

We need to use the format function in this following places:

Estimate the effort

Effort S

CrochetFeve0251 avatar Sep 28 '22 13:09 CrochetFeve0251

Related: https://secure.helpscout.net/conversation/2023325526/371698?folderId=2135277

vmanthos avatar Oct 03 '22 11:10 vmanthos

Related: https://secure.helpscout.net/conversation/2028936698/372958/

mifrero avatar Oct 06 '22 09:10 mifrero

We need to prevent adding ignored qs or any non-cache Qs to the cache table + see what else is needed on the slack discussion

Mai-Saad avatar Oct 14 '22 06:10 Mai-Saad

Related: https://secure.helpscout.net/conversation/2036856301/374701/

mifrero avatar Oct 14 '22 17:10 mifrero

Related: https://secure.helpscout.net/conversation/2062887866/380557/

viobru avatar Nov 10 '22 17:11 viobru

Related: https://secure.helpscout.net/conversation/2077564693/384213?folderId=1213662

WordPresseur avatar Nov 28 '22 10:11 WordPresseur

https://secure.helpscout.net/conversation/2080316579/385107?folderId=377611

webtrainingwheels avatar Nov 29 '22 00:11 webtrainingwheels

Blocking it since in 3.16 we'll introduce big change to the database. Implementing it now will introduce more complexity and points of failure for this major.

piotrbak avatar Jul 12 '23 14:07 piotrbak

This is not something that can be implemented soon. We might reopen this in the future.

piotrbak avatar Jul 04 '24 12:07 piotrbak