German Umlauts in links not correctly encoded
Describe the bug
German umlauts not correctly encoded in loc URLS.
To reproduce
Steps to reproduce the behaviour:
- Go to our blog-3-sitemap.xml
- Search for prediger-zu-besuch-bei-sattler and check the URL in
<image:loc> - See that it contains
Körner
Expected behaviour
To avoid confusion and parsing errors, german umlauts should be % encoded in all sitemap URLs.
Technical details
In the file helpers/Sitemap.php:592 $asset->getUrl() already returns the correctly encoded URL for the asset which gets run through UrlHelper::absoluteUrlWithProtocol() which in turns calls TextHelper::sanitizeUserInput() down the line and rawurldecode's the URL again in helpers/Text.php:340. That removes the % encoding again.
Screenshots
Versions
- Plugin version: dev-develop-v5 as 5.1.13
- Craft version: 5.8.15
IF you run your sitemap through a sitemap validator, does it present you with any errors?
e.g.:
https://www.xml-sitemaps.com/validate-xml-sitemap.html
https://validator.w3.org/
...or when looking in your Google Search Console, does it report any errors with the sitemap in question?
@khalwat No, they do not give any errors. It's just that our SEO audit recommended having all non-ASCII characters encoded according to RFC-3986 to avoid potential errors with older crawlers.
Maybe you could add an option in the Sitemap settings for that?
So this is probably something I should fix. All modern browsers/platforms handle umlats in URLs properly, but some very old legacy systems may not.
@khalwat Thanks for responding. We took care of it in the meantime by renaming the files. But I do agree with you, if not only for compatibility reasons, but also for future occurrences.