Marek Gagolewski
Marek Gagolewski
I wonder if strength=2 is what you might need: ```r stringi::stri_detect_coll(c("Mario", "mario", "Mário", "mário"), "mario", strength = 2L,case_level = TRUE, locale="pt_BR") ## [1] TRUE TRUE FALSE FALSE stringi::stri_detect_coll(c("Mario", "mario", "Mário",...
First of all, thanks, there was a bug; `locale=""` should mean `locale=NULL`, i.e., your own locale, `pt_BR`.
instead, `locale=''` meant `locale="POSIX"` - that is why it worked as expected (and perhaps this is what postgresql uses, hence the correct results). I would recommend setting locale="POSIX" explicitly then....
Hmmm.... interestingly, a collator-based string comparison honours the above rule... ```r > stringi::stri_cmp_equiv(c("Mario", "mario", "Mário", "mário"), "mario", case_level=TRUE, strength=2L) [1] FALSE TRUE FALSE FALSE > stringi::stri_cmp_equiv(c("Mario", "mario", "Mário", "mário"), "mario",...
I was trying hard to figure out why `usearch` returns a different result below, but with no success. A bug in ICU? ```r stringi::stri_detect_coll(c("Mario", "mario", "Mário", "mário"), "mario", case_level=TRUE, strength=1L)...
(note to self): ICU 69.1 gives the results as above. @TODO: create a minimal reproducible example outside of stringi
[note to self] Yes, this is reproducible outside of stringi: ```c++ /* g++ -std=c++11 icu_test_bug_ucol_caselevel.cpp -licui18n -licuuc -licudata && ./a.out */ #include #include #include #include #include #include using namespace icu;...
All right, it turns out that this issue has already been reported. It is ICU-related. https://unicode-org.atlassian.net/browse/ICU-21338
dim, names and dimnames? see `mostattributes` in `?attributes`
I was actually thinking about giving `stringi` a major re-write for quite a long time. Now that the Windows-UCRT build of R assumes all strings are natively UTF-8, and the...