Remove htmlspecialchars
Context
Described in issue https://github.com/Yoast/wordpress-seo/issues/18953
htmlspecialchars breaks urls for images that have GET parameters.
Summary
This PR can be summarized in the following changelog entry:
- Allow urls with query parameters in some upload fields
Relevant technical choices:
htmlspecialcharsmodifies the$submitted_urland potentially breaks good urls. With that in mind, the line right after already sanitizes urls very thoroughly. There is no need for this.
Test instructions
First add this snippet to functions.php (or somewhere that will run for sure):
/**
* Pseudo set image quality. It just adds get params to image urls. In real life this would be far more complex.
*/
add_filter(
'wp_get_attachment_url',
function ( $attachment_url ) {
$image_url = add_query_arg(
[
'quality' => 90,
'grain' => 0.5,
],
$attachment_url
);
return $image_url;
}
);
With that in place, go to to the Yoast SEO menu -> General -> First-time configuration -> proceed to Site Representation -> add any image to the organization logo -> Save and continue should fail to work and prevent you from going to the next step (Social Profiles).
Test instructions for QA when the code is in the RC
See above
- [x] QA should use the same steps as above.
Impact check
This PR affects the following parts of the plugin, which may require extra testing:
- Images saved to different options, specifically: company_logo, person_logo, open_graph_frontpage_image, social-image-url-
Documentation
- [ ] I have written documentation for this change.
Quality assurance
- [x] I have tested this code to the best of my abilities
- [ ] I have added unit tests to verify the code works as intended
- [ ] If any part of the code is behind a feature flag, my test instructions also cover cases where the feature flag is switched off.
- [x] I have written this PR in accordance with my team's definition of done.
Fixes #18953
Thank you for the PR.
Function htmlspecialchars converts only &, ", ', <, > to HTML entities. Method WPSEO_Utils::sanitize_url encodes is more robust approach to check URL (security, invalid chars,...), so it's fine to just remove htmlspecialchars.
There is similar code in class-wpseo-option-social.php: https://github.com/Yoast/wordpress-seo/blob/f62622e59e0a7c33d69916eb4d43b32cfe22568b/inc/options/class-wpseo-option-social.php#L258-L259
@stodorovic There-in lies the problem! & should stay as-is. I'll add a commit to fix that other file too.
Issue is:
- The browser doesn't convert
&into&when making requests - I don't understand the option saving process 100%, but at the very last step you reach https://github.com/Yoast/wordpress-seo/blob/trunk/inc/options/class-wpseo-options.php#L516 and right here there is a comparison of the saved value versus the submitted value, if they aren't the same a "failure" is created. The value saved is the validated url which went through htmlspecialchars and is now different than the submitted value that didn't have
&(for example)
", ', <, and > should be URL encoded, not turned into an html entity. WPSEO_Utils::sanitize_url handles this correctly. The & in particular is part of the url and separates query parameters. It should only be urlencoded if it is part of the key or value.
I don't see & in URLs after I've applied this PR. I think too that sanitize_url works fine - there are PHP unit tests.
Only issue which you can have in method save_option is encoded character(s) - e.g. , will be encoded as %2c (WP core/sanitize_url - lowercase), but PHP functions rawurlencode or urlencode returns %2C (uppercase). Technically these URLs are same, but string comparation fails.
I probably don't have same enviromment as you and I don't see other issues.
@stodorovic I can see the issue you are describing as potential, however, it would not occur with this changeset. htmlspecialchars only deals with &"'<>. The string case shouldn't be manipulated. Things like commas would be handled somewhere in WPSEO_Utils which is not modified.
Am I missing something?
CR & AC ✅