packcc icon indicating copy to clipboard operation
packcc copied to clipboard

Consider add support for UCD(Unicode Character Database) rule pattern

Open skylerlee opened this issue 3 years ago • 3 comments

I think it would be good to add unicode support either by using builtin rules (eg. UPPERCASE_LETTER | LOWERCASE_LETTER) like pest or using unicode property regex pattern (eg. \p{Lu} | \p{Ll}).

I guess an external library like pcre will be added or an embeddable code header file will be needed.

skylerlee avatar Sep 29 '22 03:09 skylerlee

I'm sorry for my late late late response. I understand your suggestion, and I think also there should be the needs. However, I want to keep PackCC an implementation with a single source file without depending on any external libraries other than the C standard library. It's because of just my egotistic preference, not rational reasons. So, I'm thinking of the way to realize it with introducing import functionality, which was requested in #50 .

arithy avatar Apr 14 '24 05:04 arithy

@skylerlee , I have introduced the import functionality, and bundled some import files. I think your needs can be satisfied using unicode_general_category.peg. For example, if you need the rule that matches a sequence of uppercase letters, you can do it by inserting the following lines in your PEG file:

%import "unicode_general_category.peg"

rule_uc <-  Unicode_Uppercase_Letter +

Note that you must add the directory where unicode_general_category.peg is located to the import search path using the packcc command line option -I or the environment variable PCC_IMPORT_PATH. For more details, see README.md.

arithy avatar Apr 21 '24 14:04 arithy

@skylerlee , I'd like to close this issue with the above.

arithy avatar Apr 25 '24 10:04 arithy