Creating CPT does not respect locale specific replacements
If I want to create a new CPT my typing will be corrected instantly. For example: capital letters are corrected to lower case letters. Special characters are corrected to an underscore "_".
Unfortunately these corrections do not contain locale specific replacements, for example for the German language as seen in WordPress core:
https://github.com/WordPress/WordPress/blob/a39daee51f4ce334c1d597f82f05783f7b20b50c/wp-includes/formatting.php#L1952-L1958
For example: ü=>ue, ä=>ae, ß=>ss, etc.
This function remove_accents contains some more replacements than just for the German language. It would be great if these locale specific replacements would be used in CPT UI too.
Thanks for considering!
Hi @Zodiac1978
If I'm reading correctly, you're not worrying about the fact that we correct values on the slugs. You're instead saying that we have cases where we're missing some and you're able to use those German-language focused characters in slugs and not get them fixed to match?
Hi @tw2113
thanks for asking!
and you're able to use those German-language focused characters in slugs and not get them fixed to match?
Not exactly. Those umlauts (üöä) or the "ß" get replaced, but not in the correct way. At the moment the string "üöäß" will get replaced by "uoa_". It should be "ueoeaess".
This is what happens in WordPress core through remove_accents. For German this happens in these lines:
https://github.com/WordPress/WordPress/blob/a39daee51f4ce334c1d597f82f05783f7b20b50c/wp-includes/formatting.php#L1952-L1958
There are some more special cases for other languages.
Is it clearer now?
I believe so, my biggest question was that things should be getting converted to you, but we're doing it wrong with some cases, for which we can definitely review how WP core handles themselves, and add in to our own to get things accurate and on par.
Thanks for the feedback.
Trying to make sure I'm looking in the correct places, based on as best you can identify, for where we need to revise.
Would it be here https://github.com/WebDevStudios/custom-post-type-ui/blob/master/src/js/cptui-scripts.js#L125-L133 in the diacritics section?
or
https://github.com/WebDevStudios/custom-post-type-ui/blob/master/src/js/cptui-scripts.js#L172-L174 with the cyrillic section?
Asking because I am comparing what we have to the 6 lines with your Core link and am not managing to find matches yet, so we're either missing them completely, or they're in the diacritics section which is a bit more obfuscated.
I am more familiar with the PHP way to fix those issues, but looking at this JavaScript lines, I would use the first part about diacritics. Umlauts are diacritics. Although "ß" is not a diacritic, I think this place would still be the best guess here.
From those JS /[\300-\306]/g, means À, Á, Â, Ã, Ä, Å, Æ which all get replaced with "A" (and so on for the other characters).
\304 (octal) is decimal 198 and would be "Ä" and would need to be changed to "Ae" (or maybe in this case "ae" because you want lower case letters only).
"ß" is decimal 223 and therefore octal 337, so you can add a replacement for \337 to "ss".