unicodetools icon indicating copy to clipboard operation
unicodetools copied to clipboard

Misinterpreted `@missing` line in pre-15 emoji-data

Open eggrobin opened this issue 1 year ago • 0 comments

@macchiati pointed out to me that https://util.unicode.org/UnicodeJsps/character.jsp?a=0020&history=full&showDevProperties=1 shows the following history for the Emoji property of U+0020:

Emoji 8.0..14.0: Yes 15.0..16.0α: No

and that the space was not, in fact, an Emoji between 2015 and 2021.

I think the issue is that the earlier versions of emoji-data have an @missing line which does not follow the format of the file:

# @missing: 0000..10FFFF  ; Emoji ; No

0023          ; Emoji                # E0.0   [1] (#️)       hash sign
002A          ; Emoji                # E0.0   [1] (*️)       asterisk
0030..0039    ; Emoji                # E0.0  [10] (0️..9️)    digit zero..digit nine

The parser interprets that as # @missing: 0000..10FFFF ; Emoji, and throws the No into the U+1F5D1 🗑️ WASTEBASKET. This needs some special handling in PropertyParsingInfo.java.

eggrobin avatar Mar 14 '24 10:03 eggrobin