coaster icon indicating copy to clipboard operation
coaster copied to clipboard

make_name needs a stable output variant

Open jace opened this issue 3 years ago • 2 comments

The make_name function is used to generate URL slugs in two distinct scenarios:

  1. To generate a nice URL for some content where the exact URL doesn't matter as long as it's nice (can be read, no %xx symbols, etc)
  2. To generate a stable URL representation of a piece of text such as a label or a wiki page name

In the second use case, calling make_name anytime in the future must generate the exact same output as today. The generator spec must be versioned should it need to change for any reason. Third party libraries such as unidecode cannot be used unless they offer the same guarantee. The checkused option should not be available: if two distinct pieces of text generate the same output, then they should be considered the same (for instance, differences in capitalization and optional diacritics for Latin script-based languages).

jace avatar Apr 19 '21 18:04 jace

Diacritics have semantic meaning in some Latin-script representations such as Pinyin, so it may not be wise to strip them. However, diacritics in English content URLs are an irritation, so there's also a factor of what the use case is, and should be an input parameter (apart from version number) when generating URLs.

jace avatar Apr 19 '21 18:04 jace

Reference: MediaWiki's approach, where spaces are replaced with underscores but no other transformation occurs.

jace avatar Jul 13 '22 08:07 jace