Consider add support for UCD(Unicode Character Database) rule pattern
I think it would be good to add unicode support either by using builtin rules (eg. UPPERCASE_LETTER | LOWERCASE_LETTER) like pest or using unicode property regex pattern (eg. \p{Lu} | \p{Ll}).
I guess an external library like pcre will be added or an embeddable code header file will be needed.
I'm sorry for my late late late response. I understand your suggestion, and I think also there should be the needs. However, I want to keep PackCC an implementation with a single source file without depending on any external libraries other than the C standard library. It's because of just my egotistic preference, not rational reasons. So, I'm thinking of the way to realize it with introducing import functionality, which was requested in #50 .
@skylerlee , I have introduced the import functionality, and bundled some import files.
I think your needs can be satisfied using unicode_general_category.peg.
For example, if you need the rule that matches a sequence of uppercase letters, you can do it by inserting the following lines in your PEG file:
%import "unicode_general_category.peg"
rule_uc <- Unicode_Uppercase_Letter +
Note that you must add the directory where unicode_general_category.peg is located to the import search path using the packcc command line option -I or the environment variable PCC_IMPORT_PATH. For more details, see README.md.
@skylerlee , I'd like to close this issue with the above.