wp-sitemaps icon indicating copy to clipboard operation
wp-sitemaps copied to clipboard

Provider `$object_type` and `$sub_type` need to be sanitized for the rewrite rules to match in some cases

Open pbiron opened this issue 4 years ago • 3 comments

Describe the bug

The regexes used for the rewrite rules (both currently as in #150) would not match in certain cases without the $object_type and $sub_type properties of Core_Sitemaps_Provider being sanitized (most likely via santize_title() like the REST API does for it's routes).

To Reproduce Steps to reproduce the behavior:

I came across these cases while testing #150...I tried them just to see what would happen :-) Since the buggy behavior they represent wasn't really in scope for #150 I figured it would be better to open a separate issue than include something about this in my review of that PR.

  1. register a public taxonomy with spaces in the name (e.g., register_taxonomy( 'this is a test', 'post' );
  2. add a term to that taxonomy and assign it to at least one post
  3. Load the sitemap index in your browser
  4. See something like https://example.com/wp-sitemap-taxonomies-this%20is%20a%20test-1.xml
  5. Click on that link and see that you get a 404 (because the rewrite rule wasn't matched...since it's regex doesn't allow a space)

Similarly,

  1. create and register a custom provider, whose $object_type = 'this is a test';
  2. add the other necessary methods to the custom provider so that it both shows up in the index (e.g., return >= 1 from max_num_pages()) and get_url_list() returns an array with at least 1 loc).
  3. Load the sitemap index in your browser
  4. See something like https://example.com/wp-sitemap-this%20is%20a%20test-1.xml
  5. Click on that link and see that you get a 404 (because the rewrite rule wasn't matched...since it's regex doesn't allow a space)

Expected behavior

I'd expect to see:

https://example.com/wp-sitemap-taxonomies-this-is-a-test-1.xml

and

https://example.com/wp-sitemap-this-is-a-test-1.xml

in both step 4's above and that clicking on those links correctly shows the relevant sitemaps

Additional context

I know the examples above are kind of pathological, but more real world cases are certainly possible.

Also, note that similar case for CPTs is handled correctly (e.g., register_post_type( 'this is atest', array( 'public' => true ) ) because register_post_type() sanitizes the post type (with sanitize_key().) So, register_taxonomy() should do the same? Although even in that case, custom provider $object_type should still be sanitized.

pbiron avatar Apr 28 '20 21:04 pbiron

Object sub types (e.g. custom taxonomies):

Hmm, we could do sanitize_key() in \Core_Sitemaps_Provider::get_sitemap_url, but then we also need to do it in places using \Core_Sitemaps_Provider::get_object_sub_types

Object types (e.g. custom providers):

Not sure we should really do something here. The object_type property is only really used for rewrites. If someone adds spaces there, they're doing_it_wrong()

So, register_taxonomy() should do the same?

https://core.trac.wordpress.org/ticket/14910 https://core.trac.wordpress.org/ticket/16600 https://core.trac.wordpress.org/ticket/28058

Note that it does sanitize_title_with_dashes() for rewrite slugs and query vars.

swissspidy avatar Apr 29 '20 06:04 swissspidy

Object sub types (e.g. custom taxonomies):

Hmm, we could do sanitize_key() in \Core_Sitemaps_Provider::get_sitemap_url, but then we also need to do it in places using \Core_Sitemaps_Provider::get_object_sub_types

I haven't had time to investigate where the sanitizing should happen. but those 2 seem good places to start.

And I would think that sanitize_title() would be more in line with current WP practice than sanitize_key(), as in both places we're talking about URLs.

Object types (e.g. custom providers):

Not sure we should really do something here. The object_type property is only really used for rewrites. If someone adds spaces there, they're doing_it_wrong()

It's not just spaces, e.g., $object_type = 'Foo'; would also fail, given that the rewrite regex is ([a-z]+?).

Adding a _doing_it_wrong() in Core_SItemaps::register_sitemaps() would be fine, if the DocBlock for $object_type described what was/was not allowed, but I think it'd be more "developer friendly" to just sanitize it.

So, register_taxonomy() should do the same?

https://core.trac.wordpress.org/ticket/14910 https://core.trac.wordpress.org/ticket/16600 https://core.trac.wordpress.org/ticket/28058

Note that it does sanitize_title_with_dashes() for rewrite slugs and query vars.

I probably shouldn't have mentioned a possible change to register_taxonomy(), doing so just muddied the water.

pbiron avatar Apr 29 '20 14:04 pbiron

@pbiron I wonder if this is something that should be addressed in core instead as @swissspidy already pointed at. With register_post_type already taking care of that, I think register_taxonomy should too. Sanitizing here instead would be merely patchwork in an attempt to cover what's rather broken there.

In addition to that, I don't think this is a common thing to do - while I'm sure it exists, it's certainly not recommended, I personally have never seen such a case anywhere.

I think we should focus these efforts on one of the linked core tickets and close this issue here instead.

felixarntz avatar May 01 '20 23:05 felixarntz