Propertiness
A classification of properties derived from presence in PropertyAliases, or derived from a field that we are forced to fill in in ExtraPopertyAliases (contrast PropertyStatus.java which is out of date).
In character.jsp, split the information into (UCD properties, non-UCD properties, UCD non-properties, non-UCD non-properties), with a further split for Unihan (out of UCD properties and after UCD non-properties). See it in staging:
- https://unicode-jsps-staging-o2ookmn2oq-uc.a.run.app/UnicodeJsps/character.jsp?a=A7FE
- https://unicode-jsps-staging-o2ookmn2oq-uc.a.run.app/UnicodeJsps/character.jsp?a=3400
I'd suggest that the top be the properties on https://www.unicode.org/reports/tr18/#RL2.7, perhaps with those groupings.
Put all Contributory and Provisional into a separate bucket.
Not sure what the parens are for, as in (kEH_Core)
Some values don't have links, eg "Obsolete"
Identifier_Status Restricted Identifier_Type Obsolete
The If you are going to have a bucket Non-UCD properties for U+A7FE, then add confusable, emoji, ...
Will look it over more tomorrow.
Not sure what the parens are for
Provisional, see the heading Normative, Informative, Contributory, and (Provisional) UCD properties.
I'd suggest that the top be the properties on https://www.unicode.org/reports/tr18/#RL2.7, perhaps with those groupings.
Finer property status (splitting out Contributory etc.) and groupings would be nice, but we do not have a maintainable way of keeping track of it so far (there was an attempt with PropertyStatus.java, but as noted in the PR description, that did not work). Here I am instead doing what I can based on what we are forced to maintain, namely *PropertyAliases.txt.
Some values don't have links, eg "Obsolete"
Yes, that is because it is multivalued, see https://github.com/unicode-org/unicodetools/pull/1018 item 2.
If you are going to have a bucket Non-UCD properties for U+A7FE, then add confusable, emoji, ...
Confusable is there, it goes into Non-UCD non-properties (Other information). The Identifier_* stuff is what UTS39 actually describes as a property.
RGI_Emoji (but not RGI_Emoji_*_Sequence) should be there because it is described as a property in UTS51, but isn’t because it is hacked directly into the JSPs instead of being in IndexUnicodeProperties; I will add it later, see the TODOs in ExtraPropertyAliases.
Here I am instead doing what I can based on what we are forced to maintain, namely *PropertyAliases.txt.
Note that beyond the cosmetics of grouping character.jsp, we actually want to keep track of the « is this a UCD property » information, see #1049.
Note: I tried splitting out Provisional from Normative+Informative, and that seemed counterproductive for Unihan and Unikemet (which are the only places where we have Informative properties) to have them in two blocks; hence the parentheses approach.
RGI_Emoji (but not RGI_Emoji_*_Sequence) should be there because it is described as a property in UTS51
Ah nevermind, I see UTS51 also describes the RGI_Emoji_*_Sequence zoo as properties. I’ll fix that.
As noted in the TODOs, I’d like to move RGI_Emoji and IDNA2008_Category into IndexUnicodeProperties (rather than being patched into the JSPs), and to add RGI_Emoji_Qualification, all of these being NonUcdProperty.
But I will do that in a subsequent PR.
@markusicu Friendly ping, since I think some of @jowilco’s work is blocked on this.