Comments and aliases problem
The page U+018D states
The character is also known as reversed Polish-hook o.
However there is no such a formal alias.
On the other hand Fileformat.Info contains the following comments:
reversed Polish-hook o archaic phonetic for labialized alveolar fricative recommended spellings U+007A U+02B7 or U+007A U+032B
Looks like the first comment was converted into an alias while the other ones have been skipped.
I don't think this is correct :-( How the comments are imported and how are they processed? Shouldn't they be displayed just as "Unicode comments"?
It's a verbatim copy from this file in the Unicode standard:
http://www.unicode.org/Public/UCD/latest/ucd/NamesList.txt
I don't know, where Unicode got the alias name of this one from. When I google it, the most prominent results are your mails to the Unicode mailing list ;-)
In said file, lines prefixed with = are alias names as of the standard (like the "reversed Polish-hook o"), while others (like the * prefixed) are other informative data, comments, ... (I opened #49 in order to add this missing info, too.)
I found many of those quite useful to get a generic idea of the character (see, e.g., the low ASCII control characters, or the guillemets), so I embedded the aliases in the character description.
This works quite well for almost all characters. This is the first one, where the alias seems off.
The best way to fix this would be to file an upstream issue with Unicode. A changed NamesList.txt would automatically lead to a fix here. Had you tried that already, by chance, after asking on the mailing list last year?
If that doesn't lead to a result, we could add additional info between the character description and the Wikipedia entry to describe, why the alias is problematic. If you write one or two sentences I'd add them as file codepoints.net/data/U+018D.en.md.
As a last resort I could hotfix the database to remove the alias, but I'd rather stick to the standard as close as possible. (Also the alias might sneak in again in a later import.)
Let's make haste slowly.
I've checked NameAliases.txt only, I was not aware (or forgot) that aliases are defined also in NamesList.txt. I will file an issue with Unicode, perhaps after discussing the problem on the Unicode list (it was on my TODO list already).
I'm glad you plan to handle also other informations from the file.
Some time in the future I would like to include the information from https://bitbucket.org/jsbien/unicode4polish/wiki/codes/U+018D_LATIN_SMALL_LETTER_TURNED_DELTA However I don't have a clear idea how to do it in an elegant and extensible way.
Thanks to the thread about NamesList.txt on the Unicode list I've came to the conclusion that we have to distinguish formal aliases from NameAliases.txt and informal aliases from NamesList.txt. So instead
The character is also known as reversed Polish-hook o.
we should have something like
The Unicode version ??? mentions an informal alias: reversed Polish-hook o.
I've added the Unicode version because, as far as I understand, the annotations are not stable and may vanish.
Since I parse the data anew with every next version, the informal alias would vanish then, too, here. So I guess, we could leave that out. Apart from that I very much like the idea to re-word it like this.
What about
The character is also called a reversed Polish-hook
versus e.g.
The character is also known as SYRIAC SUBLINEAR COLON SKEWED LEFT
The second example is an official alias. As for the first one, nobody knows the character as a reversed Polish-hook o, especially as there is no such thing as a Polish-hook, the diacritic mark even in English is called ogonek. The name is just an individual usage of an author of a perhaps obsolete book on phonology.
Moreover, I don't like vanishing information. I would appreciate very much the note in/since which versions the comment appeared. The proposed wording allows for it, e.g.
The character is also called a reversed Polish-hook (in Unicode 4.1.0 and later versions)