Marek Gagolewski

Results 118 comments of Marek Gagolewski

I wonder if strength=2 is what you might need: ```r stringi::stri_detect_coll(c("Mario", "mario", "Mário", "mário"), "mario", strength = 2L,case_level = TRUE, locale="pt_BR") ## [1] TRUE TRUE FALSE FALSE stringi::stri_detect_coll(c("Mario", "mario", "Mário",...

instead, `locale=''` meant `locale="POSIX"` - that is why it worked as expected (and perhaps this is what postgresql uses, hence the correct results). I would recommend setting locale="POSIX" explicitly then....

Hmmm.... interestingly, a collator-based string comparison honours the above rule... ```r > stringi::stri_cmp_equiv(c("Mario", "mario", "Mário", "mário"), "mario", case_level=TRUE, strength=2L) [1] FALSE TRUE FALSE FALSE > stringi::stri_cmp_equiv(c("Mario", "mario", "Mário", "mário"), "mario",...

I was trying hard to figure out why `usearch` returns a different result below, but with no success. A bug in ICU? ```r stringi::stri_detect_coll(c("Mario", "mario", "Mário", "mário"), "mario", case_level=TRUE, strength=1L)...

[note to self] Yes, this is reproducible outside of stringi: ```c++ /* g++ -std=c++11 icu_test_bug_ucol_caselevel.cpp -licui18n -licuuc -licudata && ./a.out */ #include #include #include #include #include #include using namespace icu;...

All right, it turns out that this issue has already been reported. It is ICU-related. https://unicode-org.atlassian.net/browse/ICU-21338

dim, names and dimnames? see `mostattributes` in `?attributes`

I was actually thinking about giving `stringi` a major re-write for quite a long time. Now that the Windows-UCRT build of R assumes all strings are natively UTF-8, and the...