wp-sitemaps
wp-sitemaps copied to clipboard
Provider `$object_type` and `$sub_type` need to be sanitized for the rewrite rules to match in some cases
Describe the bug
The regexes used for the rewrite rules (both currently as in #150) would not match in certain cases without the $object_type
and $sub_type
properties of Core_Sitemaps_Provider
being sanitized (most likely via santize_title()
like the REST API does for it's routes).
To Reproduce Steps to reproduce the behavior:
I came across these cases while testing #150...I tried them just to see what would happen :-) Since the buggy behavior they represent wasn't really in scope for #150 I figured it would be better to open a separate issue than include something about this in my review of that PR.
- register a public taxonomy with spaces in the name (e.g.,
register_taxonomy( 'this is a test', 'post' );
- add a term to that taxonomy and assign it to at least one post
- Load the sitemap index in your browser
- See something like
https://example.com/wp-sitemap-taxonomies-this%20is%20a%20test-1.xml
- Click on that link and see that you get a 404 (because the rewrite rule wasn't matched...since it's regex doesn't allow a space)
Similarly,
- create and register a custom provider, whose
$object_type = 'this is a test';
- add the other necessary methods to the custom provider so that it both shows up in the index (e.g., return >= 1 from
max_num_pages()
) andget_url_list()
returns an array with at least 1loc
). - Load the sitemap index in your browser
- See something like
https://example.com/wp-sitemap-this%20is%20a%20test-1.xml
- Click on that link and see that you get a 404 (because the rewrite rule wasn't matched...since it's regex doesn't allow a space)
Expected behavior
I'd expect to see:
https://example.com/wp-sitemap-taxonomies-this-is-a-test-1.xml
and
https://example.com/wp-sitemap-this-is-a-test-1.xml
in both step 4's above and that clicking on those links correctly shows the relevant sitemaps
Additional context
I know the examples above are kind of pathological, but more real world cases are certainly possible.
Also, note that similar case for CPTs is handled correctly (e.g., register_post_type( 'this is atest', array( 'public' => true ) )
because register_post_type()
sanitizes the post type (with sanitize_key()
.) So, register_taxonomy()
should do the same? Although even in that case, custom provider $object_type
should still be sanitized.
Object sub types (e.g. custom taxonomies):
Hmm, we could do sanitize_key()
in \Core_Sitemaps_Provider::get_sitemap_url
, but then we also need to do it in places using \Core_Sitemaps_Provider::get_object_sub_types
Object types (e.g. custom providers):
Not sure we should really do something here. The object_type
property is only really used for rewrites. If someone adds spaces there, they're doing_it_wrong()
So,
register_taxonomy()
should do the same?
https://core.trac.wordpress.org/ticket/14910 https://core.trac.wordpress.org/ticket/16600 https://core.trac.wordpress.org/ticket/28058
Note that it does sanitize_title_with_dashes()
for rewrite slugs and query vars.
Object sub types (e.g. custom taxonomies):
Hmm, we could do
sanitize_key()
in\Core_Sitemaps_Provider::get_sitemap_url
, but then we also need to do it in places using\Core_Sitemaps_Provider::get_object_sub_types
I haven't had time to investigate where the sanitizing should happen. but those 2 seem good places to start.
And I would think that sanitize_title()
would be more in line with current WP practice than sanitize_key()
, as in both places we're talking about URLs.
Object types (e.g. custom providers):
Not sure we should really do something here. The
object_type
property is only really used for rewrites. If someone adds spaces there, they'redoing_it_wrong()
It's not just spaces, e.g., $object_type = 'Foo';
would also fail, given that the rewrite regex is ([a-z]+?)
.
Adding a _doing_it_wrong()
in Core_SItemaps::register_sitemaps()
would be fine, if the DocBlock for $object_type
described what was/was not allowed, but I think it'd be more "developer friendly" to just sanitize it.
So,
register_taxonomy()
should do the same?https://core.trac.wordpress.org/ticket/14910 https://core.trac.wordpress.org/ticket/16600 https://core.trac.wordpress.org/ticket/28058
Note that it does
sanitize_title_with_dashes()
for rewrite slugs and query vars.
I probably shouldn't have mentioned a possible change to register_taxonomy()
, doing so just muddied the water.
@pbiron I wonder if this is something that should be addressed in core instead as @swissspidy already pointed at. With register_post_type
already taking care of that, I think register_taxonomy
should too. Sanitizing here instead would be merely patchwork in an attempt to cover what's rather broken there.
In addition to that, I don't think this is a common thing to do - while I'm sure it exists, it's certainly not recommended, I personally have never seen such a case anywhere.
I think we should focus these efforts on one of the linked core tickets and close this issue here instead.