Fable icon indicating copy to clipboard operation
Fable copied to clipboard

add missing unicode categories in python library

Open joprice opened this issue 1 year ago • 0 comments

This adds the missing unicode categories to fix the error ValueError: Fable error, unknown Unicode category: Ps when calling for example, Char.IsLetterOrDigit with left paren '('.

I used the values defined in the referenced doc https://docs.microsoft.com/en-us/dotnet/api/system.globalization.unicodecategory?view=net-6.0, and also found that No was assigned to UnicodeCategory.OtherLetter instead of UnicodeCategory.OtherNumber.

I'm not sure how to test the surrogate category, as I hit an error I left a note about that I didn't get a chance to look into.

The test data was created by running the following script

import sys
import unicodedata
from collections import defaultdict

unicode_category = defaultdict(list)
for c in map(chr, range(sys.maxunicode + 1)):
    unicode_category[unicodedata.category(c)].append(c)

for value in unicode_category.values():
    c = value[0]
    e = c.encode("unicode_escape")
    print(repr(e), unicodedata.category(c))

which groups chars by category and then prints each category with a sample value.

joprice avatar Jul 23 '24 00:07 joprice