unicodetools icon indicating copy to clipboard operation
unicodetools copied to clipboard

Identifier_Type values for Unicode 17

Open markusicu opened this issue 1 year ago • 1 comments

Unicode Tools tests are currently failing for lack of Identifier_Type values for new characters in Unicode 17:

unicodetools/src/main/resources/org/unicode/text/UCD/SecurityInvariantTest.txt

# https://www.unicode.org/reports/tr39/#Identifier_Status_and_Type

# “Unassigned characters, private use characters, surrogates, non-whitespace control characters.”
\p{Identifier_Type=Not_Character} = [\p{gc=Cn}\p{gc=Co}\p{gc=Cs}\p{gc=Cc}-\p{White_Space}]

-->

Expected empty, got: 4836	[\u088F\u09FF\u0B53\u0B54\u0C5C\u0CDC\u1ACF-\u1ADD\u1AE0-\u1AEB\u2B96\uA7CE\uA7CF\uA7D2\uA7D4\uA7F1\uFBC3-\uFBD2\uFD90\uFD91\uFDC8-\uFDCE\U00010940-\U0001095C\U00010EC5-\U00010EC7\U00010ED0-\U00010ED8\U00010EFA\U00010EFB\U00011B60-\U00011B67\U00011DB0-\U00011DDB\U00011DE0-\U00011DE9\U00016D80-\U00016D9D\U00016DA0-\U00016DA9\U00016EA0-\U00016EB8\U00016EBB-\U00016ED3\U00016FF2-\U00016FF6\U000187F8-\U000187FF\U00018D09-\U00018D1E\U00018D80-\U00018DF2\U0001CCFA-\U0001CCFC\U0001CEBA-\U0001CED0\U0001CEE0-\U0001CEF0\U0001E6C0-\U0001E6DE\U0001E6E0-\U0001E6F5\U0001E6FE\U0001E6FF\U0001F6D8\U0001F777-\U0001F77A\U0001F8D0-\U0001F8D8\U0001FA54-\U0001FA57\U0001FA8A\U0001FA8E\U0001FAC8\U0001FACD\U0001FADD\U0001FAEA\U0001FAEF\U0001FBFA\U0002B73A-\U0002B73E\U000323B0-\U00033479]

In	\p{Identifier_Type=Not_Character} 
But Not In	 [\p{gc=Cn}\p{gc=Co}\p{gc=Cs}\p{gc=Cc}-\p{White_Space}]

markusicu avatar Dec 04 '24 17:12 markusicu

Sample CI failures with some more details: See

  • https://github.com/unicode-org/unicodetools/pull/981

markusicu avatar Dec 04 '24 17:12 markusicu

Josh did this around 2025-mar.

markusicu avatar Apr 29 '25 19:04 markusicu