Plume icon indicating copy to clipboard operation
Plume copied to clipboard

Generate clean slugs

Open lypwig opened this issue 7 years ago • 5 comments

When we create a new blog, a slug is created (ie. My blog becomes myBlog).

To avoid characters encoding, this slug should not contains special characters:

Slugs may be entirely lowercase, with accented characters replaced by letters from the English alphabet and whitespace characters replaced by a dash or an underscore to avoid being encoded. Punctuation marks are generally removed, and some also remove short, common words such as conjunctions. Wikipedia

For instance, the blog title Blog de Nathanaël becomes ~BlogDeNathanaël, so the url is encoded to https://fediverse.blog/~/BlogDeNathana%C3%ABl/ which is hard to read for a human.

Also, is easy to spoof an identity by using a title with similar letters (for instance 𝖻а𝗍 looks identical to bat, but uses 3 other different characters)... and there are 337,968,125,414,970,750,000,000 ways to write my blog name using utf-8 confusable characers. ;)

By convention, most slugs uses hyphen-separated lowercase words.

lypwig avatar Sep 17 '18 12:09 lypwig

By convention, most slugs uses hyphen-separated lowercase words.

That's what we are doing for articles slugs, but as blogs slugs are also used as ActivityPub actor name, I prefer to have them CamelCased (if we allow to mention blogs in articles in the future for instance, it will be more coherent with usernames that rarely contains hyphens as spaces).

elegaanz avatar Sep 17 '18 18:09 elegaanz

I think I will open a debate on Loomio for this issue, because even if I agree that we shouldn't make it easy to do phishing or to impersonate someone else, I don't think we can really use something like punnycode, or create something to transform non-ascii characters to ascii. I feel like we should take the risk to have impersonation/phishing but I don't know if it is actually a good idea.

elegaanz avatar Mar 04 '19 22:03 elegaanz

Here is the Loomio discussion: https://framavox.org/d/d5P7oepg/slugs

elegaanz avatar Mar 05 '19 21:03 elegaanz

This algorithm may be usefull to solve this issue in way that both avoid security issues, and allows for characters outside of ASCII: https://wiki.mozilla.org/IDN_Display_Algorithm

elegaanz avatar Aug 01 '19 17:08 elegaanz

It's relevant for German, too. We have umlauts äüö and the ß.

thopanX avatar Jan 26 '20 11:01 thopanX