owl
owl copied to clipboard
Feature request: Unicode properties
Owl is awesome, thank you!
My proposals:
range(cp1, cp2)orrange[cp1, cp2]-cp1andcp2are codepoints here (hex or decimal)block(name)- Unicode's script name (Basic_Latin,Latin-1_Supplement, etc.)property(name)- Unicode's property name (White_Space,Hyphen,Ps,Mn, etc.)script(name)- Unicode's script name (Common,Latin, etc.)
What do you think?
Something like this would be possible, but at the moment, every token can be separated by whitespace. For example, if you had a rule like ident = property(ID_Start) property(ID_Continue)*, identifiers would include things like abc but also a b c d. The best way to make custom identifiers right now is via user-defined tokens, which involves writing a bit of code in a C function and passing it to the generated parser to use during tokenization.