unicodetools
unicodetools copied to clipboard
Misinterpreted `@missing` line in pre-15 emoji-data
@macchiati pointed out to me that https://util.unicode.org/UnicodeJsps/character.jsp?a=0020&history=full&showDevProperties=1 shows the following history for the Emoji property of U+0020:
| Emoji | 8.0..14.0: Yes | 15.0..16.0α: No |
|---|
and that the space was not, in fact, an Emoji between 2015 and 2021.
I think the issue is that the earlier versions of emoji-data have an @missing line which does not follow the format of the file:
# @missing: 0000..10FFFF ; Emoji ; No
0023 ; Emoji # E0.0 [1] (#️) hash sign
002A ; Emoji # E0.0 [1] (*️) asterisk
0030..0039 ; Emoji # E0.0 [10] (0️..9️) digit zero..digit nine
The parser interprets that as # @missing: 0000..10FFFF ; Emoji, and throws the No into the U+1F5D1 🗑️ WASTEBASKET.
This needs some special handling in PropertyParsingInfo.java.