unicodetools icon indicating copy to clipboard operation
unicodetools copied to clipboard

why does security/.../removals.txt not work with Age?

Open markusicu opened this issue 1 year ago • 2 comments

I just added the following to the security data input file removals.txt (PR #777). Why does the Age property not seem to work here? I also tried a simple \p{Age=16} without the intersection with the list of scripts. No effect either. @macchiati ideas?

# PAG meeting 2024-04-18 before Unicode 16 beta:
# [Mark]: Policy is that by default
# new characters in scripts that are not Excluded or Limited Use,
# are marked as Uncommon_Use & communicate to SEW
# to ask if there are any exceptions (needed in customary modern widespread use).
# ----
# https://www.unicode.org/reports/tr31/#Table_Recommended_Scripts
# ----
# TODO: This should work with the following set pattern but doesn't;
# and neither with \p{Age=16}. Why?
# [\P{Age=15.1}&[\p{script=Zyyy}\p{script=Zinh}\p{script=Arab}\p{script=Armn}\p{script=Beng}\p{script=Bopo}\p{script=Cyrl}\p{script=Deva}\p{script=Ethi}\p{script=Geor}\p{script=Grek}\p{script=Gujr}\p{script=Guru}\p{script=Hang}\p{script=Hani}\p{script=Hebr}\p{script=Hira}\p{script=Kana}\p{script=Knda}\p{script=Khmr}\p{script=Laoo}\p{script=Latn}\p{script=Mlym}\p{script=Mymr}\p{script=Orya}\p{script=Sinh}\p{script=Taml}\p{script=Telu}\p{script=Thaa}\p{script=Thai}\p{script=Tibt}]] ; uncommon_use

markusicu avatar May 03 '24 04:05 markusicu

I'll have to look at it

On Thu, May 2, 2024, 21:18 Markus Scherer @.***> wrote:

I just added the following to the security data input file removals.txt (PR #777 https://github.com/unicode-org/unicodetools/pull/777). Why does the Age property not seem to work here? I also tried a simple \p{Age=16} without the intersection with the list of scripts. No effect either. @macchiati https://github.com/macchiati ideas?

PAG meeting 2024-04-18 before Unicode 16 beta:

[Mark]: Policy is that by default

new characters in scripts that are not Excluded or Limited Use,

are marked as Uncommon_Use & communicate to SEW

to ask if there are any exceptions (needed in customary modern widespread use).

----

https://www.unicode.org/reports/tr31/#Table_Recommended_Scripts

----

TODO: This should work with the following set pattern but doesn't;

and neither with \p{Age=16}. Why?

[\P{Age=15.1}&[\p{script=Zyyy}\p{script=Zinh}\p{script=Arab}\p{script=Armn}\p{script=Beng}\p{script=Bopo}\p{script=Cyrl}\p{script=Deva}\p{script=Ethi}\p{script=Geor}\p{script=Grek}\p{script=Gujr}\p{script=Guru}\p{script=Hang}\p{script=Hani}\p{script=Hebr}\p{script=Hira}\p{script=Kana}\p{script=Knda}\p{script=Khmr}\p{script=Laoo}\p{script=Latn}\p{script=Mlym}\p{script=Mymr}\p{script=Orya}\p{script=Sinh}\p{script=Taml}\p{script=Telu}\p{script=Thaa}\p{script=Thai}\p{script=Tibt}]] ; uncommon_use

— Reply to this email directly, view it on GitHub https://github.com/unicode-org/unicodetools/issues/800, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJLEMEREYWGVFHUFFYORQ3ZAMF2ZAVCNFSM6AAAAABHE27LK6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGI3TMOBWGU2DOMQ . You are receiving this because you were mentioned.Message ID: @.***>

macchiati avatar May 03 '24 14:05 macchiati

The following works to get just characters > version 14.0 and ≤ 15.1.

[\p{script=Zyyy}\p{script=Zinh}\p{script=Arab}\p{script=Armn}\p{script=Beng}\p{script=Bopo}\p{script=Cyrl}\p{script=Deva}\p{script=Ethi}\p{script=Geor}\p{script=Grek}\p{script=Gujr}\p{script=Guru}\p{script=Hang}\p{script=Hani}\p{script=Hebr}\p{script=Hira}\p{script=Kana}\p{script=Knda}\p{script=Khmr}\p{script=Laoo}\p{script=Latn}\p{script=Mlym}\p{script=Mymr}\p{script=Orya}\p{script=Sinh}\p{script=Taml}\p{script=Telu}\p{script=Thaa}\p{script=Thai}\p{script=Tibt} &\p{Age=15.1} -\p{Age=14.0}]

BTW, the following is a more concise way to list the scripts, if you are using modified version of UnicodeSet in the unicodetools

\p{script=Zyyy}\p{script=Zinh}\p{script=Arab}\p{script=Armn}\p{script=Beng}\p{script=Bopo}\p{script=Cyrl}\p{script=Deva}\p{script=Ethi}\p{script=Geor}\p{script=Grek}\p{script=Gujr}\p{script=Guru}\p{script=Hang}\p{script=Hani}\p{script=Hebr}\p{script=Hira}\p{script=Kana}\p{script=Knda}\p{script=Khmr}\p{script=Laoo}\p{script=Latn}\p{script=Mlym}\p{script=Mymr}\p{script=Orya}\p{script=Sinh}\p{script=Taml}\p{script=Telu}\p{script=Thaa}\p{script=Thai}\p{script=Tibt}

==>

\p{script=/Zyyy|Zinh|Arab|Armn|Beng|Bopo|Cyrl|Deva|Ethi|Geor|Grek|Gujr|Guru|Hang|Hani|Hebr|Hira|Kana|Knda|Khmr|Laoo|Latn|Mlym|Mymr|Orya|Sinh|Taml|Telu|Thaa|Thai|Tibt/}

I often wish we had that in the stock ICU...

On Fri, May 3, 2024 at 7:17 AM Mark Davis Ⓤ @.***> wrote:

I'll have to look at it

On Thu, May 2, 2024, 21:18 Markus Scherer @.***> wrote:

I just added the following to the security data input file removals.txt (PR #777 https://github.com/unicode-org/unicodetools/pull/777). Why does the Age property not seem to work here? I also tried a simple \p{Age=16} without the intersection with the list of scripts. No effect either. @macchiati https://github.com/macchiati ideas?

PAG meeting 2024-04-18 before Unicode 16 beta:

[Mark]: Policy is that by default

new characters in scripts that are not Excluded or Limited Use,

are marked as Uncommon_Use & communicate to SEW

to ask if there are any exceptions (needed in customary modern widespread use).

----

https://www.unicode.org/reports/tr31/#Table_Recommended_Scripts

----

TODO: This should work with the following set pattern but doesn't;

and neither with \p{Age=16}. Why?

[\P{Age=15.1}&[\p{script=Zyyy}\p{script=Zinh}\p{script=Arab}\p{script=Armn}\p{script=Beng}\p{script=Bopo}\p{script=Cyrl}\p{script=Deva}\p{script=Ethi}\p{script=Geor}\p{script=Grek}\p{script=Gujr}\p{script=Guru}\p{script=Hang}\p{script=Hani}\p{script=Hebr}\p{script=Hira}\p{script=Kana}\p{script=Knda}\p{script=Khmr}\p{script=Laoo}\p{script=Latn}\p{script=Mlym}\p{script=Mymr}\p{script=Orya}\p{script=Sinh}\p{script=Taml}\p{script=Telu}\p{script=Thaa}\p{script=Thai}\p{script=Tibt}]] ; uncommon_use

— Reply to this email directly, view it on GitHub https://github.com/unicode-org/unicodetools/issues/800, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJLEMEREYWGVFHUFFYORQ3ZAMF2ZAVCNFSM6AAAAABHE27LK6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGI3TMOBWGU2DOMQ . You are receiving this because you were mentioned.Message ID: @.***>

macchiati avatar May 04 '24 19:05 macchiati