Does stringi export something like `u_hasBinaryProperty(c, UCHAR_ALPHABETIC)`?
I am writing a parser for LaTeX code, and I'm hoping to support UTF-8 input. TeX and LaTeX categorize each input character, and one of the categories is whether it is a letter or not. I'm not sure how the Unicode-supporting versions of LaTeX handle this, but one thing I wanted to try was to use the ICU test u_hasBinaryProperty(c, UCHAR_ALPHABETIC). That's the only ICU function I need, so linking ICU into my package is possible but seems like overkill.
Does stringi provide this kind of categorization of the characters in a string? Ideally it would be something I could call from C, but if it's only available from R that would be very helpful too. I couldn't spot it in the reference docs, but maybe I just missed it.
As per Sec. 5.4.3 of Writing R Extensions, I've made this function available via R_GetCCallable (in the current development version of stringi). It's declared as
int stric_u_hasBinaryProperty(int c, int which);
See https://github.com/gagolews/stringi/blob/master/src/stri_callables.cpp
Let me know if that works for you?
UCHAR_ALPHABETIC is 0 (https://github.com/unicode-org/icu/blob/main/icu4c/source/common/unicode/uchar.h) /this is very unlikely to change in the future/
Thanks! I'll give it a try.