Generate clean slugs
When we create a new blog, a slug is created (ie. My blog becomes myBlog).
To avoid characters encoding, this slug should not contains special characters:
Slugs may be entirely lowercase, with accented characters replaced by letters from the English alphabet and whitespace characters replaced by a dash or an underscore to avoid being encoded. Punctuation marks are generally removed, and some also remove short, common words such as conjunctions. Wikipedia
For instance, the blog title Blog de Nathanaël becomes ~BlogDeNathanaël, so the url is encoded to
https://fediverse.blog/~/BlogDeNathana%C3%ABl/ which is hard to read for a human.
Also, is easy to spoof an identity by using a title with similar letters (for instance 𝖻а𝗍 looks identical to bat, but uses 3 other different characters)... and there are 337,968,125,414,970,750,000,000 ways to write my blog name using utf-8 confusable characers. ;)
By convention, most slugs uses hyphen-separated lowercase words.
By convention, most slugs uses hyphen-separated lowercase words.
That's what we are doing for articles slugs, but as blogs slugs are also used as ActivityPub actor name, I prefer to have them CamelCased (if we allow to mention blogs in articles in the future for instance, it will be more coherent with usernames that rarely contains hyphens as spaces).
I think I will open a debate on Loomio for this issue, because even if I agree that we shouldn't make it easy to do phishing or to impersonate someone else, I don't think we can really use something like punnycode, or create something to transform non-ascii characters to ascii. I feel like we should take the risk to have impersonation/phishing but I don't know if it is actually a good idea.
Here is the Loomio discussion: https://framavox.org/d/d5P7oepg/slugs
This algorithm may be usefull to solve this issue in way that both avoid security issues, and allows for characters outside of ASCII: https://wiki.mozilla.org/IDN_Display_Algorithm
It's relevant for German, too. We have umlauts äüö and the ß.